In .NET System.Object.GetHashCode method is used in a lot of places, throughout the .NET base class libraries. Especially when finding items in a collection fast or to determine equality. Is there a standard algorithm/ best practice on how to implement the GetHashCode override for my custom classes so I don't degrade performance?

33 upvote
  flag
After reading this question and the article below, i could implement override of GetHashCode. I hope it would be helpful for others. Guidelines and rules for GetHashCode written by Eric Lippert – rene
3 upvote
  flag
"or to determine equality": no! Two objects with the same hashcode are not necessarily equal. – Thomas Levesque
upvote
  flag
@ThomasLevesque You are right, two objects with the same hash code are not necessarily equal. But still GetHashCode() is used in very many implementations of Equals(). That's what I meant with that statement. GetHashCode() inside Equals() is often used as a shortcut to determine inequality, because if two objects have a different hash code they have to be objects that are not equal and the rest of the equality-check doesn't have to executed. – bitbonk
upvote
  flag
@bitbonk Usually, both GetHashCode() and Equals() need to look at all fields of both objects (Equals has to do this if it the hashcodes are equal or not-checked). Because of this, a call to GetHashCode() inside Equals() is often redundant and could reduce performance. Equals() may also be able to short circuit, making it much faster - however in some cases the hashcodes may be cached, making the GetHashCode() check faster and so worthwhile. See this question for more. – NotEnoughData

16 Answers 11

up vote 1264 down vote accepted

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.
7 upvote
  flag
The algorithm described in the book you mention is infact a little more detailed it especailly describes what to do for different data types of the fields. E.g.: for fields of type long use (int)(field ^ f >>> 32) instead of simply calling GetHashcode. Is long.GetHashCodes implemented that way ? – bitbonk
12 upvote
  flag
Yup, Int64.GetHashCode does exactly that. In Java that would require boxing, of course. That reminds me - time to add a link to the book... – Jon Skeet
66 upvote
  flag
23 is no good choice, since(as of .net 3.5 SP1) Dictionary<TKey,TValue> assumes good distribution modulo certain primes. And 23 is one of them. So if you have a dictionary with Capacity 23 only the last contribution to GetHashCode influences the compound hashcode. So I'd rather use 29 instead of 23. – CodesInChaos
upvote
  flag
@Ani: Your implementation allocated several new objects on the heap, so the performance might is lower than with a manual implementation. Whether that's acceptable depends or your type and usage. Check some of the other answers for helpers using generics which avoid this problem. – CodesInChaos
20 upvote
  flag
@CodeInChaos: Only the last contribution influences the bucket - so it might, at worst, have to look through all 23 entries in the dictionary. It's still going to check the actual hash code of each entry, which will be cheap. If you've got a dictionary that small, it's unlikely to matter much. – Jon Skeet
3 upvote
  flag
@Jon: I have to ask, despite already having opened my own question on the topic, but what's a good VB version of this since VB lacks the checked and unchecked keywords? I tried making tmpHash an Int64 and AND'ing the lower 8 bits (per the accepted answer to my question), but on a sufficiently-large enough set of fields, it somehow caused the calculation to wrap to 0 for the remainder of the loop. – Kumba
1 upvote
  flag
@Kumba: I don't know how I'd do this in VB, I'm afraid. Is arithmetic always checked in VB? Could you have a separate class library which you could delegate the arithmetic to, either written in C# or with checked arithmetic turned off project-wide? – Jon Skeet
3 upvote
  flag
@Jon: VB apparently checks a lot of things. It's got a fetish for demanding that unsigned numerics get converted into signed numerics before letting you left or right shift them. Which is driving me up the wall and across the ceiling. I'm trying to implement the Jenkins hash to work around the lack of checked/unchecked (a rotating hash also does the trick, but I'm worried about hash collisions with input). Using a separate, C# library is something I'd like to avoid, because it essentially admits defeat. If I get to that point, I should just re-write the entire project in C#. – Kumba
2 upvote
  flag
isn't the 'unchecked' unnecessary b/c CLR will happily overflow by default? – pomeroy
7 upvote
  flag
@pomeroy: It depends on what the project settings are. Basically you give the assembly a default context of checked or unchecked. – Jon Skeet
3 upvote
  flag
@pomeroy: VB isn't as granular as C# is. Because it lacks the above-two keywords, your only options are to remove nteger overflows for the entire project or not. I guess if your project is complete and generally well tested, removing the overflow checks is a safe thing to do. However, while building it and debugging, those checks are good because they'll highlight bugs to fix. I opened Connect Ticket # 636564 with Microsoft to recommend including checked/unchecked keyword support in the next .NET release. Uncertain if they'll do so, though. – Kumba
upvote
  flag
I'll add that I'll have to go with the rotating hash algorithm linked to in Jon's answer above. It doesn't overflow, even on an Int32, doesn't (so far) wrap to 0 on a large number of fields in the calculation, and is simple and rather speedy. The Jenkins hash didn't work out...Even that overflows at random, depending on the input. Also, the forcing of bitshifts happening in signed math get in the way of a lot of things. I might open another bug on that, unless that's intended behavior somehow. – Kumba
upvote
  flag
Don't you need override in your method declaration? Would also be good to put in null checks since this is such a well-used example. – Rory
upvote
  flag
@Rory: I've added the override, thanks - I'm not going to put in the null checks, as I feel that would obscure the important points. The comment suffices IMO. – Jon Skeet
1 upvote
  flag
Why start with a prime instead of just zero? does int hash = 17; have any theoretically supported benefits? – fredoverflow
2 upvote
  flag
@FredOverflow: I don't know the exact details of all the reasons behind it, but starting with 0 would mean that the hash would stay as zero if the individual field hashes were zero... and that's probably not uncommon (e.g. an integer of value zero will probably hash to zero). Just a guess, but I suspect that having a constant which gets propagated with each field is useful. This is really just copied from Effective Java though :) – Jon Skeet
upvote
  flag
@JonSkeet How collision safe would this algortihm be for an complex object graph consisting of let's say 500 objects with each having 10 properties. Related question: //allinonescript.com/questions/5308057/… – bitbonk
upvote
  flag
@bitbonk: The chances of a collision on any single change would be pretty low... but in the question you're talking about I'd probably use a cryptographic hash instead. – Jon Skeet
upvote
  flag
Then the questions stands, how do I create a cryptographic hash on an object model? – bitbonk
1 upvote
  flag
@bitbonk: I would strongly consider using a "normal" cryptographic hash on the result of binary serialization of form. – Jon Skeet
1 upvote
  flag
This algorithm is basically the DJB2 string hashing algorithm, for which the constants 5381 and 33 are recommended (cse.yorku.ca/~oz/hash.html). To be honest I'm not sure the constant makes much difference, but the multiplier is important. – yoyo
upvote
  flag
@JonSkeet I realize I'm raising the dead here, but implementing hashes is a new thing for me. In your implementation, what fields am I including in the hash? Only immutable ones, or are any fields good? – KChaloux
2 upvote
  flag
@KChaloux: That entirely depends on what you want equality to mean. Normally it's a bad idea to include mutable data though. – Jon Skeet
2 upvote
  flag
How would you handle nullity? If simply ignoring that field, then for A = null, B = "ss" and for A = "ss", B = null we would have colisions. Is't better to multiply each field with different prime number? – Vajda
15 upvote
  flag
@Vajda: I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field. – Jon Skeet
upvote
  flag
@jnm2: I don't follow your argument, to be honest. In particular, I've just tried this effectively hashing 10 fields - and changing the value of just the first field still changed the hash, contradicting your claim that "every bit of the first hash codes will be lost". – Jon Skeet
upvote
  flag
You can demonstrate this gives a poor distribution quite simply. Take this variant of FNV and apply it to strings (use unsafe pointer manipulation to get integers at a time to give it a fair chance). Use it to add strings to a power-of-two based hash table. With the one I'm working on now, if I generate "1", "2", ... "999999" and add them in, it takes about 34 seconds. Now take this same hash method and re-hash the result with a well-distributed hash. With a good hash this can only make things worse (more time spent, and we could introduce new collisions but never remove them). With the... – Jon Hanna
upvote
  flag
... same hash table I'm working on, the same code to generate "1"..."999999" and add them takes 1 second. The effect is less pronounced with prime-number based hashes, so it that case the extra time spent re-hashing (and perhaps the reduction in possible outcomes, though that's unlikely) doesn't gain anything, but the poor performance on power-of-two tables demonstrates poor distribution generally. – Jon Hanna
upvote
  flag
@JonHanna: Thanks for that. Not sure what you mean by "to get integers at a time" but I'll try to have a closer look. I still like this as a first approximation for a hash, but if you have another hash which is as simple to remember and get right but with a better distribution, I'd be very happy to change my practice :) – Jon Skeet
upvote
  flag
I meant I used fixed(char* ptr = str){int* iPtr = (int*)ptr;... but I also tried just doing foreach(char c in str) and casting each char to int, and the same applies. The relative weakness became apparent to me when I had reason to use power-of-two tables and was getting poor results (I used to use much the same as above, myself). The solution I finally hit upon is to forget about being easy to remember, and to build a hard-to-remember method once, and then make it easy to use and put it the code at nuget.org/packages/SpookilySharp I'll add a full answer here at lunchtime. – Jon Hanna
upvote
  flag
@JonSkeet and now answered. – Jon Hanna
upvote
  flag
@JonHanna: Thanks for that. Will have to look in more detail when I have a bunch of time :) – Jon Skeet
upvote
  flag
I think is important to point out that we should be careful with the Hash code changing at run-time. We had a bug in my project because the previous developer had implemented a GetHashCode algorithm based in this answer. But in his implementation he had a list of objects, he used the hash of each item in the collection to generate the object hash code. Therefore, when the collection changed, the hash code changed also. That was causing binding problems in WPF. And if you had the object in a dictionary for example, you would get errors also. – Dzyann
1 upvote
  flag
@Dzyann: Yes, mutating the key in a way which affects equality - and thus the hash code - it always a bad idea. I'll add a note. – Jon Skeet
upvote
  flag
@JonSkeet you are right, and it can lead to very hard to track bugs. Like in this case with the bindings of WPF. It took ages until one of my coworkers found the cause of it and solved it. Since It wasn't our code it was very challenging. – Dzyann
upvote
  flag
I would suggest that you change 17 and 23 to the constants here. (Thanks for the link.) It gave a simple dictionary lookup much higher performance, in my case, ~60% better. – jnm2
upvote
  flag
@jnm2: That's not the same algorithm to start with - it's using XOR rather than ADD. I'll stick with these constants for this answer, but perhaps you should add your own answer? – Jon Skeet
upvote
  flag
Actually, I was going to suggest that xoring rather than adding wouldn't diminish the simplicity as a go-to hash algorithm. What do you think? – jnm2
upvote
  flag
XOR ends up making GetHashCode() faster by 12% in my case. – jnm2
1 upvote
  flag
@jnm2: Well it wouldn't diminish that simplicity - but it's not what I've done for the past several years. I'll add FNV as an alternative though. – Jon Skeet
upvote
  flag
int hash = 2166136261; Is there a cast missing? Compiler says that 2166136261 is a uint... I changed it to int hash = (int)2166136261; – Roman Ganz
upvote
  flag
@knight_killer: Fixed, thanks. – Jon Skeet
upvote
  flag
I tried to implement this approach for ValueUtils, but in my testing this variant of FNV caused considerable collisions (24%) in certain symmetrical datasets. And perhaps that's because this is NOT actually the FNV hash? Tradutional FNV hashes per octet (byte), not per 32-bit word. That gives this variant less opportunity to mix this bits around... – Eamon Nerbonne
upvote
  flag
@EamonNerbonne: What do you mean by "this approach"? The answer now contains two different versions... – Jon Skeet
upvote
  flag
I mean this variant of FNV - it's not quite FNV, and I'm pretty sure that makes matters worse. Incidentally, I also tried the h=prime; repeat h=h*prime + ? recipe; that seems to vary; it does quite well for larger primes, especially if your intermediate is 64-bit wide. – Eamon Nerbonne
upvote
  flag
@Eamon: I don't know enough about the theory to comment further, I'm afraid :( – Jon Skeet
upvote
  flag
Yeah, the theory behind it is not at all obvious to me. However - this answer suggests that this implementation is FNV, a well-known good hash. But that's not really true, since this is not FNV. Also, FNV is a string hash algorithm, which needs to satisfy much trickier requirements in that it needs to work for variable length potentially long strings. But again, the algorithm currently in the answer is not FNV - it mixes the bits much less well. – Eamon Nerbonne
upvote
  flag
@EamonNerbonne: Okay. I'll edit to indicate that it's a modification, and that it doesn't work as well in at least some cases. – Jon Skeet
upvote
  flag
@EamonNerbonne: What are the best coefficients that you know of? – jnm2
upvote
  flag
@jnm2 In my experiments, the offset matters little, and the trend is that larger primes perform better, with the caveat that this is all tricky to test because it's slow (really slow) to be thorough, and it depends on the way in which your dataset is "messed up". If your fields have perfectly randomly distributed hashcodes - none of this matters, but of course, in the real world, those hash codes aren't random, and fields are correlated. There's a fairly good reason why large primes would be better too - they mix bits better, especially if your data consists largely of small numbers. – Eamon Nerbonne
1 upvote
  flag
@jnm2, so I'd pick a largish prime (on the order of 2^16, say) and to tune for .NET's dictionary implementation, one that's NOT used by Dictionary<,>: referencesource.microsoft.com/#mscorlib/system/collections/… – Eamon Nerbonne
1 upvote
  flag
@jnm2 I came across these two questions further exploring this issue: //allinonescript.com/questions/1835976/… and //allinonescript.com/questions/1145217/… and both conclude: use any old large prime. The accepted answer in the first question mentions two chosen in a principled way - but it's not likely a principle that really relates to the real world, so it still recommends the basic idea: pick a large prime, NOT 23 or 31. – Eamon Nerbonne
upvote
  flag
BTW: note that the offset is (as far as I can tell) completely pointless. Distributive laws also hold under modulo, and that means it's simply an identical offset all objects will share - that certainly has no impact on any hashtable I I know. – Eamon Nerbonne
upvote
  flag
@EamonNerbonne: I guess that's true if all the objects are of the same type. If you've got a dictionary where some keys are subclasses of other keys, that could make a difference... although only when the extra field values are 0 anyway. Again, this is mostly just habit for me :( – Jon Skeet
upvote
  flag
@JonSkeet Yeah, if you have objects of differing type and you use differing offsets, you'll have some advantage. No reason to be prime, though, I think... In any case, an addition is so cheap, there's not much reason to avoid it either. – Eamon Nerbonne
upvote
  flag
I used this algorithm for a pseudo-random generator, and it behaves a little strangely: //allinonescript.com/questions/26847262/… – Max Yankov
2 upvote
  flag
If you got the number 486187739 from //allinonescript.com/a/2816747/21499 - I actually intended to recommend 92821. – Hans-Peter Störr
upvote
  flag
As each instance of class 'object' has a unique hash code, this idea came to my mind that it would be good if we use base.GetHashCode() as a seed or something to produce our hash code for the object. – Ahmad Siavosh
2 upvote
  flag
@AhmadSiavosh: No, that's a bad idea, because you want distinct but equal objects to have the same hash code. (I don't think object.GetHashCode is guaranteed to be unique, either. It may well be "very unlikely to collide" but that's not the same thing.) – Jon Skeet
upvote
  flag
If a fieldLis a List<obj>, it will work by just doing hash = ... ^ fieldL.GetHashCode() or I have to go through the items like foreach(){hash = ... ^ item.GetHashCode()}??? – Jaider
upvote
  flag
@Jaider: It won't do either. List<T> doesn't override Equals or GetHashCode.# – Jon Skeet
upvote
  flag
I tried this code for 3 doubles, and I get an enormous amount of collisions. I need to get hashcodes for 4194304 tuples. Is there a better way? Using some larger primes helped a little bit, but I am still getting collisions. – user984444
upvote
  flag
@user984444: Well you should expect quite a few collisions with that many entries. Just how many are you getting? – Jon Skeet
upvote
  flag
@JonSkeet It's hard to say. I am using this to cache the output of some perlin noise, and the indicator of collision is some "intersting" output in my image; It has the appearance of... when you win solitaire. This gets mitigated (and the patern changes) with larger primes. Very unhelpful I know. I have changed my struct (tuple of doubles as the key) into a class so that net takes care of the hashcode for me, and it has no collisions anymore. – user984444
upvote
  flag
@user984444: Um, that way equal keys won't be equal though, unless you've overridden GetHashCode in your class, in which case you've got the same problem. Maybe it would be worth you asking a new question with all the details... – Jon Skeet
upvote
  flag
@JonSkeet: Not true; the default implementation of GetHashCode is working perfectly well (If it wasn't, it would be incredibly obvious in my end result). It works for a struct as well, but is WOEFULLY SLOW. I wanted to use structs, but using a class seems to work just fine for my use case. – user984444
upvote
  flag
@user984444: Unless you override GetHashCode and Equals yourself, or inherit from another class which does so, you will get reference equality. That's not what the struct will give you. It really, really sounds like we need a new post here with details. – Jon Skeet
upvote
  flag
@JonSkeet: I consider my particular problem solved because I am getting the desired result, but if I get a chance, I'll post a question with details so you can see what is going on. – user984444
upvote
  flag
being really picky, StyleCop default settings generates a warning for this code (SA1407) as you have not used parentheses to determine the precedence of the arithmetic operators, even though it is clean to any dev reading the code, and the compiler as we all know the BODMAS rule. – MikeW
upvote
  flag
@MikeW: I don't think BODMAS includes XOR :) I think the final piece of code would be clearer with parentheses - will add them now. I agree that the multiply-and-add version doesn't need them. – Jon Skeet
upvote
  flag
I didn't know about "unchecked" - thanks for that! – Ben
upvote
  flag
For future readers: Consider using HashCode.Combine() – RobIII

Most of my work is done with database connectivity which means that my classes all have a unique identifier from the database. I always use the ID from the database to generate the hashcode.

// Unique ID from database
private int _id;

...    
{
  return _id.GetHashCode();
}
upvote
  flag
That means that if you have objects Person and Account and they both have and ID = 1, they will have the same hash code. And that is not ok. – Petar Repac
15 upvote
  flag
Actually the comment above is incorrect. There will always be the possibility of hash-code collisions (a hash code only locates the bucket, not the individual object). So such an implementation - for a hashcode containing mixed objects - would lead to a lot of collisions, which is undesirable, but it would be absolutely fine if you only ever had objects of a single type in your hashtables. Also it doesn't distribute evenly, however neither does the base implementation on system.object, so I wouldn't worry about it too much... – piers7
2 upvote
  flag
The hash code can just be the id, since the id is an integer. There's no need to call GetHashCode on an integer (it's an identity function) – Darrel Lee
1 upvote
  flag
@DarrelLee but tomo his _id could be a Guid. It's a good coding practice to do _id.GetHashCode as the intent is clear. – nawfal
upvote
  flag
@DarrelLee, it is a not a good option because sequential IDs from the database don't provide a good distribution – Aleksey Bykov
2 upvote
  flag
@1224 depending on use patterns it can be horrible for the reason you give, but it can also be great; if you've a sequence of such numbers with no holes, then you've a perfect hash, better than any algorithm can produce. If you know that's the case you can even count on it and skip the equality check. – Jon Hanna

I have a Hashing class in Helper library that I use it for this purpose.

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

Then, simply you can use it as:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

I didn't assess its performance, so any feedback is welcomed.

24 upvote
  flag
Well, it will cause boxing, if fields are value types. – nightcoder
4 upvote
  flag
+1 for the null check that most other (otherwise perhaps better) answers have not included. – Mark Hurd
5 upvote
  flag
"can be enhanced later by catching the OverflowException" The whole point of the unchecked is to avoid exceptions on overflow which is desired on GetHashCode. So it's not incorrect if the value overflows int and it does not hurt at all. – Tim Schmelter
1 upvote
  flag
One issue with this algorithm is that any array full of nulls will always return 0, regardless of it's length – Nathan Adams
upvote
  flag
This helper method also allocates a new object[] – James Newton-King
upvote
  flag
As @NathanAdams mentions, the fact that null is skipped entirely could give you unexpected results. Instead of skipping them, you should just use some constant value instead of input[i].GetHashCode() when input[i] is null. – David Schwartz

In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many. You just have to make sure that calculating the hash is really cheap (No allocations, please) and fast (No heavy computations and certainly no database connections) and provides a good distribution.

The heavy lifting should be part of the Equals() method; the hash should be a very cheap operation to enable calling Equals() on as few items as possible.

And one final tip: Don't rely on GetHashCode() being stable over multiple aplication runs. Many .Net types don't guarantee their hash codes to stay the same after a restart, so you should only use the value of GetHashCode() for in memory data structures.

8 upvote
  flag
"In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many." This is dangerous advice, because for objects which only differ in the un-hashed fields, you will get hash collisions. If this happens frequently, the performance of hash-based collections (HashMap, HashSet etc.) will degrade (up to O(n) in the worst case). – sleske
7 upvote
  flag
This actually happened in Java: In early versions of the JDK String.hashCode() only considered the beginning of the string; this lead to performance problems if you used Strings as keys in HashMaps which only differed at the end (which is common e.g. for URLs). The algorithm was therefore changed ( in JDK 1.2 or 1.3 I believe). – sleske
2 upvote
  flag
If that one field 'provides a good distribution' (last part of my answer), then one field is enough.. If it doesn't provide a good distribution, then (and just then) you need another calculation. (E.g. just use another field that does provide a good distribution, or use multiple fields) – Bert Huijben
upvote
  flag
I don't think there's a problem with having GetHashCode perform memory allocations, provided that it only does so the first time it's used (with subsequent invocations simply returning a cached result). The important thing is not that one should go to great lengths to avoid collisions, but rather that one should avoid "systemic" collisions. If a type has two int fields oldX and newX which frequently differ by one, a hash value of oldX^newX would assign 90% of such records hash values of 1, 2, 4, or 8. Using oldX+newX [unchecked arithmetic] might generate more collisions... – supercat
1 upvote
  flag
...than would more sophisticated function, but a collection of 1,000,000 things that have 500,000 different hash values will very well if each hash value has two associated things, and very badly if one hash value has 500,001 things and the others have one each. – supercat

Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
         unchecked
         {
             return 31 * arg1.GetHashCode() + arg2.GetHashCode();
         }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, 
        T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items 
    /// does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(
        IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent 
    /// interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).
    ///     CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }

Also it has extension method to provide a fluent interface, so you can use it like this:

public override int GetHashCode()
{
    return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

or like this:

public override int GetHashCode()
{
    return 0.CombineHashCode(Manufacturer)
        .CombineHashCode(PartN)
        .CombineHashCode(Quantity);
}
4 upvote
  flag
No need for T[] separately as it is already IEnumerable<T> – nawfal
4 upvote
  flag
You could refactor those methods and restrict the core logic to one function – nawfal
10 upvote
  flag
Incidentally, 31 is a shift and subtract on the CPU, which is exceedingly fast. – Chui Tey
1 upvote
  flag
An extension method in int is nasty namespace pollution - @safak-gur 's answer below nicely side-steps that problem. – Eamon Nerbonne
4 upvote
  flag
@nightcoder you could use params. – ANeves
4 upvote
  flag
@ChuiTey This is something all Mersenne Primes have in common. – Pharap

This is a good one:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

And here is how to use it:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}
6 upvote
  flag
@Magnus, can you explain why it's a good one? – David Rutten
upvote
  flag
How are the Keys determined? GetHashCode() doesn't take any parameters, so it needs to call this one with two Keys that need to be determined somehow. Sorry, without further explanation this only looks clever, but not that good. – Michael Stum
upvote
  flag
It't hash code generation part of my hash code helper class. – Magnus
upvote
  flag
I'll post the complete code. – Magnus
upvote
  flag
And why do you need the generic overloads? The type is not important (and not used in your code) since all objects have a GetHashCode() method, so you can always use the method with the params array parameter. Or am I missing something here? – gehho
upvote
  flag
It's about performance, avoid the loop for smaller <= 4 fields. But I guess the generics could be skipped and just use object instead. – Magnus
1 upvote
  flag
When you'd use object instead of generics you'd get boxing and memory allocations, which you don't want in GetHashCode. So generics are the way to go. – CodesInChaos
1 upvote
  flag
The trailing shift/xor steps (h += (h << 10); h ^= (h >> 6); h += (h << 3); h ^= (h >> 11); h += (h << 15); have a codesmell: they do not depend on any of the input and look awfully redundant to me. – sehe
upvote
  flag
@nawfal what speed considerations do you have? – Magnus
upvote
  flag
@Magnus nothing in particular other than the general rule that hashing must be fast. This can't be as fast as I would love it to be. But as I said this gives better distribution of values which may be suitable for some cases. – nawfal
upvote
  flag
@nawfal Running this 100 million times takes about 390 ms. Running the solution suggested by Jon Skeet 100 million times takes about 320 ms so it is not a huge difference. – Magnus
upvote
  flag
@Magnus yes right, I'll delete my original comment. Just a little note that this may not be as fast as some other solutions here, but as you say shouldn't matter. The distribution is great, better than most solutions here, so +1 from me! :) – nawfal

Microsoft already provides a good generic HashCode generator: Just copy your property/field values to an anonymous type and hash it:

new { PropA, PropB, PropC, PropD }.GetHashCode();

This will work for any number of properties. It does not use boxing or extra resources. It just uses the algorithm already implemented in the framework for anonymous types.

68 upvote
  flag
Yes, anonymous GetHashCode implementation is very effective (BTW it's the same as the one in the Jon Skeet's answer), but the only problem with this solution is that you generate a new instance at any GetHashCode call. It can be a bit overhead-ish in particular in case of intensive access to big hashed collections... – digEmAll
3 upvote
  flag
This works in VB w/ .NET 4.0, but looking over the IL, it's using box calls since the type uses generics. No unboxing, but from my reading here, the mere presence of boxing suggests this can be a little inefficient. Seems like the only choice for VB, though, since there is not equivalent of checked/`unchecked'. – Kumba
5 upvote
  flag
@digEmAll Good point, I didn't think about the overhead of creating an new object. Jon Skeet's answer is the most efficient and won't use boxing. (@Kumba To solve the unchecked in VB, just use a Int64 (long) and truncate it after the calculations.) – Rick Love
38 upvote
  flag
could just say new { PropA, PropB, PropC, PropD }.GetHashCode() too – sehe
2 upvote
  flag
In VB.Net: New With {PropA, PropB, PropC, PropD}.GetHashCode() – mwolfe02
15 upvote
  flag
VB.NET must use Key in anonymous type creation: New With {Key PropA}.GetHashCode() Otherwise GetHashCode will not return the same hashcode for different objects with the same 'identifying' properties. – David Osborne
2 upvote
  flag
Don't forget to enumerate your IEnumerables or bad things will happen. new { PropA, PropB, C = PropC.ToList() }.GetHashCode() – Keith
2 upvote
  flag
@Keith in that case, I would consider saving the IEnumerable as a list value somewhere instead of enumerating it each time the hashcode is calculated. Caclulating ToList each time inside GetHashCode could hurt performance in many situations. – Rick Love
upvote
  flag
And don't forget that private property/fields are not needed in this case ;). – shA.t

Here is my simplistic approach. I am using the classic builder pattern for this. It is typesafe (no boxing/unboxing) and also compatbile with .NET 2.0 (no extension methods etc.).

It is used like this:

public override int GetHashCode()
{
    HashBuilder b = new HashBuilder();
    b.AddItems(this.member1, this.member2, this.member3);
    return b.Result;
} 

And here is the acutal builder class:

internal class HashBuilder
{
    private const int Prime1 = 17;
    private const int Prime2 = 23;
    private int result = Prime1;

    public HashBuilder()
    {
    }

    public HashBuilder(int startHash)
    {
        this.result = startHash;
    }

    public int Result
    {
        get
        {
            return this.result;
        }
    }

    public void AddItem<T>(T item)
    {
        unchecked
        {
            this.result = this.result * Prime2 + item.GetHashCode();
        }
    }

    public void AddItems<T1, T2>(T1 item1, T2 item2)
    {
        this.AddItem(item1);
        this.AddItem(item2);
    }

    public void AddItems<T1, T2, T3>(T1 item1, T2 item2, T3 item3)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
    }

    public void AddItems<T1, T2, T3, T4>(T1 item1, T2 item2, T3 item3, 
        T4 item4)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
    }

    public void AddItems<T1, T2, T3, T4, T5>(T1 item1, T2 item2, T3 item3, 
        T4 item4, T5 item5)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
        this.AddItem(item5);
    }        

    public void AddItems<T>(params T[] items)
    {
        foreach (T item in items)
        {
            this.AddItem(item);
        }
    }
}
upvote
  flag
you can avoid the object creation inside gethashcode function as in Mangus's answer. Just call the damn static hash functions (who cares about starter hash). Also, you could use AddItems<T>(params T[] items) method more often in the helper class (than calling AddItem(T) each time). – nawfal
upvote
  flag
And what benefit do you find doing this.result * Prime2 * item.GetHashCode() when often used is this.result * Prime2 + item.GetHashCode()? – nawfal
upvote
  flag
That was a typo, thanks! – bitbonk
upvote
  flag
I can't use AddItems<T>(params T[] items) more often because typeof(T1) != typeof(T2) etc. – bitbonk
upvote
  flag
oh yes I missed that. – nawfal

Microsoft lead for several way of hashing....

return this.value;// for classes that contain a single int value

return x ^ y;//for classes that contain multiple int value

return ((int)value ^ (int)(value >> 32));//Forclasses that contain single number bigger than int

return obj1.GetHashCode();//for classes that contain class instance field which inherit from object

return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode();//for classes that contain multiple class instance field which inherit from object

i can guess that for multiple big int u can use this:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

and same for multi-type... all converted first to int using GetHashCode() then the int values will be xor'ed and the result is ur hash...

for those who use hash as ID (i mean a unique value), hash is naturally limit number of digits, i think it was 5 bytes for hashing algorithm, at last MD5...

you may turn multiple value to a hashed value and some of them be same, so don't use it as an identifier... (maybe some day i gonna use your component)

5 upvote
  flag
Xoring integers to make a hashcode is a well-known antipattern that tends to result in a particularly high number of collisions with real-world values. – Jon Hanna
upvote
  flag
Every one here use integer, and there been never any kind of guarantee for hash to be same, it just tried to be as much vary as there are few collisions to happen. – deadManN
upvote
  flag
Yes, but your second and fifth don't try to avoid collisions. – Jon Hanna
upvote
  flag
Not sure which thread it was... but it did the same, msdn.microsoft.com/en-us/library/… – deadManN
1 upvote
  flag
Yes, that antipattern is quite common. – Jon Hanna
upvote
  flag
that's why i relay on that, but thanks for lighten things up... the other thing is does the other patter has less calculation time? you know, kind of, collision vs calculation time matter is also there – deadManN
1 upvote
  flag
There's a balance to reach. Use a really good hash code like Spookyhash and you'll get much, much better collision avoidance but it will have much more calculation time than any of these (but when it comes to hashing very large amounts of data, Spookyhash is extremely quick). A simple shift on one of the values before xoring is only marginal extra cost for a good reduction in collision. Prime-number multiplication increasing both time and quality again. Which is better between shift or mult is hence debatable. Plain xor though very often has a lot of collisions on real data and is best avoided – Jon Hanna

Here's my helper class that uses the Jon Skeet's implementation.

public static class HashCode
{
    public const int Start = 17;

    public static int Hash<T>(this int hash, T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked((hash * 31) + h);
    }
}

Usage:

public override int GetHashCode()
{
    return HashCode.Start
        .Hash(field1)
        .Hash(field2)
        .Hash(field3);
}

If you want to avoid writing an extension method for System.Int32:

public struct HashCode
{
    private readonly int hashCode;

    public HashCode(int hashCode)
    {
        this.hashCode = hashCode;
    }

    public static HashCode Start { get; } = new HashCode(17);

    public static implicit operator int(HashCode hashCode)
        => hashCode.GetHashCode();

    public HashCode Hash<T>(T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked(new HashCode((this.hashCode * 31) + h));
    }

    public override int GetHashCode()
        => this.hashCode;
}

It's still generic, it still avoids any heap allocation and it's used exactly the same way:

public override int GetHashCode()
{
    // This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance.
    // And the result is implicitly converted to `Int32`.
    return HashCode.Start
        .Hash(field1)
        .Hash(field2)     
        .Hash(field3);
}

Update after Martin's comment:

obj != null caused boxing so I switched to the default comparer.

  • See this answer regarding the default comparer's performance.
  • See this question for a discussion about the hash codes of null values.
1 upvote
  flag
I would change the line with the tertiary operator to be: var h = Equals(obj, default(T)) ? 0 : obj.GetHashCode(); – Bill Barry
upvote
  flag
I believe that the ternary operator with obj != null will compile to a box instruction which will allocate memory if T is a value type. Instead you can use obj.Equals(null) which will compile to a virtual call of the Equals method. – Martin Liversage
upvote
  flag
Because this.hashCode != h. It wouldn't return the same value. – Şafak Gür
upvote
  flag
Sorry, manage to remove my comment instead of editing it. Is it more beneficial to create a new struct then change the hashCode to non-readonly and do: "unchecked { this.hashCode ^= h * 397; } return this;" for example? – Erik Karlsson
upvote
  flag
Immutability has its benefits (Why are mutable structs evil?). About performance, what I do is pretty cheap since it does not allocate any space in the heap. – Şafak Gür
upvote
  flag
There is no boxing if you call it like Hash(1) and not like Hash<MyInterface>(myStruct). //allinonescript.com/questions/8823239 – user764754

Up until recently my answer would have been very close to Jon Skeet's here. However, I recently started a project which used power-of-two hash tables, that is hash tables where the size of the internal table is 8, 16, 32, etc. There's a good reason for favouring prime-number sizes, but there are some advantages to power-of-two sizes too.

And it pretty much sucked. So after a bit of experimentation and research I started re-hashing my hashes with the following:

public static int ReHash(int source)
{
  unchecked
  {
    ulong c = 0xDEADBEEFDEADBEEF + (ulong)source;
    ulong d = 0xE2ADBEEFDEADBEEF ^ c;
    ulong a = d += c = c << 15 | c >> -15;
    ulong b = a += d = d << 52 | d >> -52;
    c ^= b += a = a << 26 | a >> -26;
    d ^= c += b = b << 51 | b >> -51;
    a ^= d += c = c << 28 | c >> -28;
    b ^= a += d = d << 9 | d >> -9;
    c ^= b += a = a << 47 | a >> -47;
    d ^= c += b << 54 | b >> -54;
    a ^= d += c << 32 | c >> 32;
    a += d << 25 | d >> -25;
    return (int)(a >> 1);
  }
}

And then my power-of-two hash table didn't suck any more.

This disturbed me though, because the above shouldn't work. Or more precisely, it shouldn't work unless the original GetHashCode() was poor in a very particular way.

Re-mixing a hashcode can't improve a great hashcode, because the only possible effect is that we introduce a few more collisions.

Re-mixing a hash code can't improve a terrible hash code, because the only possible effect is we change e.g. a large number of collisions on value 53 to a large number of value 183487291.

Re-mixing a hash code can only improve a hash code that did at least fairly well in avoiding absolute collisions throughout its range (232 possible values) but badly at avoiding collisions when modulo'd down for actual use in a hash table. While the simpler modulo of a power-of-two table made this more apparent, it was also having a negative effect with the more common prime-number tables, that just wasn't as obvious (the extra work in rehashing would outweigh the benefit, but the benefit would still be there).

Edit: I was also using open-addressing, which would also have increased the sensitivity to collision, perhaps more so than the fact it was power-of-two.

And well, it was disturbing how much string.GetHashCode() could be improved this way (on the order of tests running about 20-30times faster due to fewer collisions) and more disturbing how much my own hash codes could be improved (much more than that).

All the GetHashCode() implementations I'd coded in the past, and indeed used as the basis of answers on this site, were much worse than I'd throught. Much of the time it was "good enough" for much of the uses, but I wanted something better.

So I put that project to one side (it was a pet project anyway) and started looking at how to produce a good, well-distributed hash code in .NET quickly.

In the end I settled on porting SpookyHash to .NET. Indeed the code above is a fast-path version of using SpookyHash to produce a 32-bit output from a 32-bit input.

Now, SpookyHash is not a nice quick to remember piece of code. My port of it is even less so because I hand-inlined a lot of it for better speed*. But that's what code reuse is for.

Then I put that project to one side, because just as the original project had produced the question of how to produce a better hash code, so that project produced the question of how to produce a better .NET memcpy.

Then I came back, and produced a lot of overloads to easily feed just about all of the native types (except decimal†) into a hash code.

It's fast, for which Bob Jenkins deserves most of the credit because his original code I ported from is faster still, especially on 64-bit machines which the algorithm is optimised for‡.

The full code can be seen at https://bitbucket.org/JonHanna/spookilysharp/src but consider that the code above is a simplified version of it.

However, since it's now already written, one can make use of it more easily:

public override int GetHashCode()
{
  var hash = new SpookyHash();
  hash.Update(field1);
  hash.Update(field2);
  hash.Update(field3);
  return hash.Final().GetHashCode();
}

It also takes seed values, so if you need to deal with untrusted input and want to protect against Hash DoS attacks you can set a seed based on uptime or similar, and make the results unpredictable by attackers:

private static long hashSeed0 = Environment.TickCount;
private static long hashSeed1 = DateTime.Now.Ticks;
public override int GetHashCode()
{
  //produce different hashes ever time this application is restarted
  //but remain consistent in each run, so attackers have a harder time
  //DoSing the hash tables.
  var hash = new SpookyHash(hashSeed0, hashSeed1);
  hash.Update(field1);
  hash.Update(field2);
  hash.Update(field3);
  return hash.Final().GetHashCode();
}

*A big surprise in this is that hand-inlining a rotation method that returned (x << n) | (x >> -n) improved things. I would have been sure that the jitter would have inlined that for me, but profiling showed otherwise.

decimal isn't native from the .NET perspective though it is from the C#. The problem with it is that its own GetHashCode() treats precision as significant while its own Equals() does not. Both are valid choices, but not mixed like that. In implementing your own version, you need to choose to do one, or the other, but I can't know which you'd want.

‡By way of comparison. If used on a string, the SpookyHash on 64 bits is considerably faster than string.GetHashCode() on 32 bits which is slightly faster than string.GetHashCode() on 64 bits, which is considerably faster than SpookyHash on 32 bits, though still fast enough to be a reasonable choice.

upvote
  flag
When combining multiple hash values into one, I tend to use long values for the intermediate results, and then munge the final result down to an int. Does that seem like a good idea? My concern is that one uses e.g. hash=(hash*31)+nextField, then pairs of matching values will only affect the upper 27 bits of the hash. Letting the calculation extend to a long and wrapping stuff in would minimize that danger. – supercat
upvote
  flag
@supercat it depends on the distribution of your final munging. The SpookilySharp library would ensure that the distribution was good, ideally (because it won't need object creation) by passing a pointer to a blittable type, or passing one of the enumerables it handles directly, but if you don't already have blittable data or a suitable enumeration, then calling .Update() with the multiple values as per the answer above will do the trick. – Jon Hanna
upvote
  flag
@JonHanna would you be willing to be more precise with the problematic behavior you encountered? I'm trying to implement a library that makes implementing value objects trivial (ValueUtils) and I'd love a testset demonstrating poor hash miscibility in power-of-two hashtables. – Eamon Nerbonne
upvote
  flag
@EamonNerbonne I don't really have anything more precise than "overall time was slower that way". As I added in an edit, the fact that I was using open-addressing may have been more important than the power-of-two factor. I do plan to do some test cases on a particular project where I'll be comparing a few different approaches, so I may have a better answer for you after that, though that's not a high-priority (a personal project with no pressing need, so I'll get to it when I get to it...) – Jon Hanna
upvote
  flag
@JonHanna: yeah I know how the personal project schedule goes - good luck! In any case, I see I didn't phrase that last comment well: I meant to ask for the problematic input, and not necessarily the details of the problems that resulted. I'd love to use that as a test set (or inspiration for a test set). In any case - good luck with your pet project :-). – Eamon Nerbonne
upvote
  flag
I'd bet that your ReHash is a big overkill. I guess, it works well, but it may be even slower than cryptographic hashed which are (sort of) proven to work perfectly. Java also uses power-of-two sized tables and there used to be a rather complicated re-hashing. It's got simplified since tree nodes for collisions were introduced. – maaartinus
upvote
  flag
@maaartinus in terms of speed and distribution, it's well demonstrated. I'm now of the opinion that it's more trouble than it's worth for small values. I'd still use the fuller implementation of SpookyHash when it comes to hashing very large values like file contents. – Jon Hanna

Here is another fluent implementation of the algorithm posted above by Jon Skeet, but which includes no allocations or boxing operations:

public static class Hash
{
    public const int Base = 17;

    public static int HashObject(this int hash, object obj)
    {
        unchecked { return hash * 23 + (obj == null ? 0 : obj.GetHashCode()); }
    }

    public static int HashValue<T>(this int hash, T value)
        where T : struct
    {
        unchecked { return hash * 23 + value.GetHashCode(); }
    }
}

Usage:

public class MyType<T>
{
    public string Name { get; set; }

    public string Description { get; set; }

    public int Value { get; set; }

    public IEnumerable<T> Children { get; set; }

    public override int GetHashCode()
    {
        return Hash.Base
            .HashObject(this.Name)
            .HashObject(this.Description)
            .HashValue(this.Value)
            .HashObject(this.Children);
    }
}

The compiler will ensure HashValue is not called with a class due to the generic type constraint. But there is no compiler support for HashObject since adding a generic argument also adds a boxing operation.

I ran into an issue with floats and decimals using the implementation selected as the answer above.

This test fails (floats; hash is the same even though I switched 2 values to be negative):

        var obj1 = new { A = 100m, B = 100m, C = 100m, D = 100m};
        var obj2 = new { A = 100m, B = 100m, C = -100m, D = -100m};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

But this test passes (with ints):

        var obj1 = new { A = 100m, B = 100m, C = 100, D = 100};
        var obj2 = new { A = 100m, B = 100m, C = -100, D = -100};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

I changed my implementation to not use GetHashCode for the primitive types and it seems to work better

    private static int InternalComputeHash(params object[] obj)
    {
        unchecked
        {
            var result = (int)SEED_VALUE_PRIME;
            for (uint i = 0; i < obj.Length; i++)
            {
                var currval = result;
                var nextval = DetermineNextValue(obj[i]);
                result = (result * MULTIPLIER_VALUE_PRIME) + nextval;

            }
            return result;
        }
    }



    private static int DetermineNextValue(object value)
    {
        unchecked
        {

                int hashCode;
                if (value is short
                    || value is int
                    || value is byte
                    || value is sbyte
                    || value is uint
                    || value is ushort
                    || value is ulong
                    || value is long
                    || value is float
                    || value is double
                    || value is decimal)
                {
                    return Convert.ToInt32(value);
                }
                else
                {
                    return value != null ? value.GetHashCode() : 0;
                }
        }
    }
1 upvote
  flag
In case you intended otherwise unchecked does NOT affect Convert.ToInt32: uint, long, float, double and decimal can all overflow here. – Mark Hurd

Pretty much similar to nightcoder's solution except it's easier to raise primes if you want to.

PS: This is one of those times where you puke a little in your mouth, knowing that this could be refactored into one method with 9 default's but it would be slower, so you just close your eyes and try to forget about it.

/// <summary>
/// Try not to look at the source code. It works. Just rely on it.
/// </summary>
public static class HashHelper
{
    private const int PrimeOne = 17;
    private const int PrimeTwo = 23;

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();
            hash = hash * PrimeTwo + arg10.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();

            return hash;
        }
    }
}
1 upvote
  flag
Doesn't handle nulls. – JJS

ReSharper users can generate GetHashCode, Equals, and others with ReSharper -> Edit -> Generate Code -> Equality Members.

// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
    unchecked {
        int hashCode = Id;
        hashCode = (hashCode * 397) ^ IntMember;
        hashCode = (hashCode * 397) ^ OtherIntMember;
        hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
        // ...
        return hashCode;
    }
}

397 seems to be an ersatz ReSharper "signature", as all methods as of version 2016.2 use it.

As of https://github.com/dotnet/coreclr/pull/14863, there is a new way to generate hash codes that is super simple! Just write

public override int GetHashCode()
    => HashCode.Combine(field1, field2, field3);

This will generate a quality hash code without you having to worry about the implementation details.

upvote
  flag
That looks like a sweet addition... any way to know what version of .NET Core that'll ship in? – Dan J
1 upvote
  flag
@DanJ What a happy coincidence, the HashCode changes for corefx were merged just a couple of hours before your comment :) The type is slated to ship in .NET Core 2.1. – James Ko
upvote
  flag
That is awesome - and quite the turnaround time. Upvoted. :) – Dan J
upvote
  flag
@DanJ Even better news-- it should be available right now on the nightly builds of CoreFX hosted on the dotnet-core MyGet feed. – James Ko
upvote
  flag
Sweet - that doesn't help me at work, since we're not quite that bleeding-edge, but good to know. Cheers! – Dan J

Not the answer you're looking for? Browse other questions tagged or ask your own question.