Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Community » U++ community news and announcements » Changes in hashing
Changes in hashing [message #54110] Mon, 01 June 2020 15:43 Go to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Up until now, hash codes in U++ very strictly 32bit dword. It turns out that using 64bit hash codes on 64bit CPUs is actually faster (more bytes can be processed at once), so I have introduced new type, hash_t, which is 32 bit with 32 bit CPU and 64 bit otherwise and changed the code to compute/use 64 bit hashes instead. Results in about 5% improvement in idmap benchmark...

Practical consideration for user types: If type supports hashing by dword GetHashValue() method, it will continue to work just fine, but might be improved by converting that to hash_t. Template specialisation GetHashValue needs to change the return type hash_t (compiler issues error if it is not).

Mirek
Re: Changes in hashing [message #54177 is a reply to message #54110] Fri, 05 June 2020 12:28 Go to previous messageGo to next message
Oblivion is currently offline  Oblivion
Messages: 1091
Registered: August 2007
Senior Contributor
Hello Mirek,

I'm somewhat confused about this.

Just to be clear: Does this mean that the client code can continue to use dword GetHashValue() variant on 64-bit machines?

I'm worried because, e.g, In Terminal ctrl I use a dword hash of incoming string data as unique ID for images and hyperlinks, in each cell.
Changing it to 64-bit would be very expensive (memory consumption will be significantly higher).

Truncating 64-bit to 32-bit probably won't do too much harm here, but still, it means information loss and I'd prefer to avoid that.

Or do I have to separately maintain the 32-bit hash funtions on 64-bit confiurations?

Best regards,
Oblivion


Re: Changes in hashing [message #54178 is a reply to message #54177] Fri, 05 June 2020 12:40 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Oblivion wrote on Fri, 05 June 2020 12:28
Hello Mirek,

I'm somewhat confused about this.

Just to be clear: Does this mean that the client code can continue to use dword GetHashValue() variant on 64-bit machines?

I'm worried because, e.g, In Terminal ctrl I use a dword hash of incoming string data as unique ID for images and hyperlinks, in each cell.
Changing it to 64-bit would be very expensive (memory consumption will be significantly higher).

Truncating 64-bit to 32-bit probably won't do too much harm here, but still, it means information loss and I'd prefer to avoid that.

Or do I have to separately maintain the 32-bit hash funtions on 64-bit confiurations?

Best regards,
Oblivion


Well, first of all, using GetHashValue as unique ID is probably not a good idea, but I guess you mean something slightly different there.

Second, simply truncating to 32-bit bit would indeed, for memhash produced numbers, produce inferior hashes, just like taking lower bits in previous 32-bit incarnation, as fast hashing algorithms tend to accumulate entropy in highest bits. But exactly for this reason we have (for quite a long time) FoldHash function, which IMO seems ideal for you scenario. It takes hash_t and produces dword, while bringing entropy back to lowest bits (BTW, look at implementation, I think it is one of more clever ideas from me... Smile

Mirek
Re: Changes in hashing [message #54179 is a reply to message #54178] Fri, 05 June 2020 13:02 Go to previous message
Oblivion is currently offline  Oblivion
Messages: 1091
Registered: August 2007
Senior Contributor
Quote:
first of all, using GetHashValue as unique ID is probably not a good idea, but I guess you mean something slightly different there.


Yeah, The IDs are not "really" meant to be unique. Bad wording. They are used in cache management, and dword is sufficient in my case.

Quote:
But exactly for this reason we have (for quite a long time) FoldHash function, which IMO seems ideal for you scenario. It takes hash_t and produces dword, while bringing entropy back to lowest bits


Well, I did not know that, because, you know, lack of documentation... Smile But this is good news. I'll test with FoldHash and look into the code ASAP.

Quote:


(BTW, look at implementation, I think it is one of more clever ideas from me...


You have a lot of clever ideas. The main reason why I prefer U++. Smile

Thank you!

Best regads,
Oblivion


Previous Topic: 2020.1 rc
Next Topic: Global Value Cache
Goto Forum:
  


Current Time: Thu Mar 28 12:44:02 CET 2024

Total time taken to generate the page: 0.02052 seconds