U++ forum: Welcome to the forum

Status & Roadmap

Authors & License

Funding Ultimate++

Search on this site

Search in forums

Home » Community » Coffee corner » Map implementation

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Switch to threaded view of this topic

Create a new topic

Submit Reply

Map implementation [message #51398]

Thu, 21 March 2019 16:32

cbpporter is currently offline

cbpporter
Messages: 1427
Registered: September 2007

Ultimate Contributor

I have implemented plenty of stuff since I'm a programmer, including many containers and even multiple small GUI toolkits (not as good as U++ of course Laughing

Laughing

) but I never implemented a hash maps, let alone one that is accessibly as an array like in U++.

Can I shamelessly dissect and steal VectorMap? Razz

Razz

I'm currently studying HashBase, trying to figure out how it works and how the masking process works. Then Index...

Or are you aware of some other easy to learn and re-implement version of hash maps out there that will also perform more than adequately?

And how would you compare the U++ version to a more "standard" one?

Thanks!

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51403 is a reply to message #51398]

Fri, 22 March 2019 07:18

mirek is currently offline

mirek
Messages: 14264
Registered: November 2005

Ultimate Member

cbpporter wrote on Thu, 21 March 2019 16:32

I have implemented plenty of stuff since I'm a programmer, including many containers and even multiple small GUI toolkits (not as good as U++ of course ) but I never implemented a hash maps, let alone one that is accessibly as an array like in U++.

Can I shamelessly dissect and steal VectorMap?

It is opensource, is not it? And I have not applied for any patents.... Smile

Smile

Quote:

Or are you aware of some other easy to learn and re-implement version of hash maps out there that will also perform more than adequately?

And how would you compare the U++ version to a more "standard" one?

Compared to standard one, it is "alien technology" Smile

Smile

Mirek

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51456 is a reply to message #51398]

Wed, 27 March 2019 03:33

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

cbpporter wrote on Thu, 21 March 2019 11:32

Or are you aware of some other easy to learn and re-implement version of hash maps out there that will also perform more than adequately?

Useful links:
https://en.wikipedia.org/wiki/Open_addressing
https://probablydance.com/2017/02/26/i-wrote-the-fastest-has htable/
http://www.idryman.org/blog/2017/05/03/writing-a-damn-fast-h ash-table-with-tiny-memory-footprints/
https://aras-p.info/blog/2016/08/02/Hash-Functions-all-the-w ay-down/
https://preshing.com/20160201/new-concurrent-hash-maps-for-c pp/
http://szelei.me/constexpr-murmurhash/
https://opensource.googleblog.com/2014/03/introducing-farmha sh.html

Hope this helps. Smile

Smile

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51487 is a reply to message #51456]

Tue, 02 April 2019 14:13

cbpporter is currently offline

cbpporter
Messages: 1427
Registered: September 2007

Ultimate Contributor

Thank you very much!

There is a lot to go through...

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51488 is a reply to message #51487]

Tue, 02 April 2019 17:48

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

cbpporter wrote on Tue, 02 April 2019 08:13

There is a lot to go through...

This is actually not that hard. All aspects of the hash table design are very well described in "I Wrote The Fastest Hashtable". You just need to choose your design. The easiest way to start with is "Powers of Two", linear probing, internal chaining. "internal chaining" means that you explicitly store a pointer to the next element in a chain, and "linear probing" means that you are using linear memory scan to find first available slot.

Ideally, your hash table should be completely policy-based, so you can easily replace linear probing with quadratic probing, for example.

It is not hard to implement a table having one fixed design. What is really hard is to make it completely policy-based.

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51489 is a reply to message #51488]

Tue, 02 April 2019 18:39

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

Actually, the right name for "internal chaining" seems to be "Coalesced hashing".

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51519 is a reply to message #51489]

Mon, 08 April 2019 13:57

cbpporter is currently offline

cbpporter
Messages: 1427
Registered: September 2007

Ultimate Contributor

I'm dissecting Index and noticed that Reindex calls ClearIndex and Free, while ClearIndex calls Free Smile

Smile

.

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51521 is a reply to message #51519]

Mon, 08 April 2019 22:37

mirek is currently offline

mirek
Messages: 14264
Registered: November 2005

Ultimate Member

Yeah, that is uncessary.

Anyway, Index will probably get overhaul soon. I plan to drop unused features and slightly change the semantics to gain more performance and cleaner API and operation.

Mirek

[Updated on: Tue, 09 April 2019 09:29]

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51522 is a reply to message #51521]

Tue, 09 April 2019 14:57

cbpporter is currently offline

cbpporter
Messages: 1427
Registered: September 2007

Ultimate Contributor

I re-implemented Index (copy&paste, trancode and cleanup) and should be a good starting point since it is smaller and simpler than a lot of maps out there.

I'll test mine thoroughly and benchmark, but before I benchmarked U++ vs. set vs. multiset, just to set my expectations accordingly.

U++ is pretty fast when compared to STL, but it does have exponential growth. With many items, in some tests, when multiplying items by 10, STL will go from like 250ms to 320, while Index will go from 250ms to 2 minutes Smile

Smile

.

I used this simple test:

	const int SEQ = 10;
	const int MAX = 3000000 * SEQ;
	const int JUMP = 100;
	
	{
		Index<int> ind;
		
		StopWatch ts;
		
		for (int j = 0; j < MAX; j += JUMP)
			for (int i = j; i < j + SEQ; i++) {
				ind.Add(i);
				//ind.Debug(Cout());
			}
		
		Cout() << "U++ add " << ts.Elapsed() << "\n";
		
		ts.Reset();
		
		int count = 0;
		for (int i = 0; i < MAX; i++)
			if (ind.Find(i) != -1)
				count++;
		
		Cout() << "U++ find " << ts.Elapsed() << "\n";
		Cout() << count << "\n";
	}
	
	{
		std::set<int> ind;
		
		StopWatch ts;
		
		for (int j = 0; j < MAX; j += JUMP)
			for (int i = j; i < j + SEQ; i++) {
				ind.insert(i);
				//ind.Debug(Cout());
			}
		
		Cout() << "set add " << ts.Elapsed() << "\n";
		
		ts.Reset();
		
		int count = 0;
		for (int i = 0; i < MAX; i++)
			if (ind.find(i) != ind.end())
				count++;
		
		Cout() << "set find " << ts.Elapsed() << "\n";
		Cout() << count << "\n";
	}
	
	{
		std::multiset<int> ind;
		
		StopWatch ts;
		
		
		for (int j = 0; j < MAX; j += JUMP)
			for (int i = j; i < j + SEQ; i++) {
				ind.insert(i);
				//ind.Debug(Cout());
			}
		
		Cout() << "multiset add " << ts.Elapsed() << "\n";
		
		ts.Reset();
		
		int count = 0;
		for (int i = 0; i < MAX; i++)
			if (ind.find(i) != ind.end())
				count++;
		
		Cout() << "multiset find " << ts.Elapsed() << "\n";
		Cout() << count << "\n";
	}

Is this an OK first artificial benchmark?

For MAX as large as the example, this is the starting point where U++ begins to loose vs. STL.

Next I need to compare memory usage...

Edit: I broke the benchmark into add and find, and the exponential growth for many items is at the find phase.

[Updated on: Tue, 09 April 2019 15:03]

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51523 is a reply to message #51522]

Tue, 09 April 2019 17:16

mirek is currently offline

mirek
Messages: 14264
Registered: November 2005

Ultimate Member

You have accidentally succeeded in attack on used hash mechanism - what you see is result of hash collisions. This is sad and something to deal with, however this would be much harder to achieve with String and quite unlikely to happen with real (aka random) data.

Still it is something I am worried about and will fix properly in the next version. For now, try to replace

inline dword FoldHash(dword h)
{
	return h - 362437 * SwapEndian32(h);
}

inline int& HashBase::Maph(unsigned _hash) const
{
	unsigned h = _hash & ~UNSIGNED_HIBIT;
	return map[mask & FoldHash(h)];
}

(In future, I will make '362437' number random prime, which will likely make this kind of attack impossible).

Mirek

EDIT: Forgot the change in Map.h

[Updated on: Wed, 10 April 2019 00:21]

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51524 is a reply to message #51523]

Wed, 10 April 2019 11:07

mirek is currently offline

mirek
Messages: 14264
Registered: November 2005

Ultimate Member

Update: Upon further investigation... The proposed change is not perfect. Better solution:


inline dword FoldHash(dword h)
{
	return SwapEndian32(2833151717 * h);
}

Interestingly, in this particular case it runs slower. It took me a couple of hours to figure out why: It is cache issue. This much better FoldHash actually spreads hashes nicely through hash space (thus causing cache misses), while the previous one tended to put them close (cache hits). It had accidentally fixed this particular benchmark's collisions, but would probably fail in some other scenario.

Conclusion: At this number of elements, the benchmark is memory bound (for int), so well working hasmap will perform similary to std::set (actually, I think the benchmark favors std::set here, as sequential numbers will repeat the same path in the binary tree, which is more cache friendly).

Original FoldHash was too simplistic for ink keys, this new version should be harder to attack.

In future version of Index, I will try to randomize FoldHash (and other hashing ops).

Mirek

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51525 is a reply to message #51524]

Wed, 10 April 2019 11:39

cbpporter is currently offline

cbpporter
Messages: 1427
Registered: September 2007

Ultimate Contributor

I tested the "h - 362437 * SwapEndian32(h);" version and so far it seems to fix the collision issue and also regularly blows std::set out of the water. Needs more testing to cover enough cases. I also talked with a colleague and his hashmap version is supposedly 3x+ faster than stl, so it looks like it is very possible to greatly outperform it.

I will test the 2833151717 version too.

And with other types, like points and strings.

Anyway, Index has been highly educational. A lot of resources out there are either very basic, talking more about the principles of managing buckets or are about taking something that works very well and squeezing the last bits of performance out of it.

I'm still not experienced enough to tell how well things should be distributed when "map" inside HashBase grows, but I added plenty of debug methods...

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51526 is a reply to message #51524]

Wed, 10 April 2019 16:13

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

mirek wrote on Wed, 10 April 2019 05:07

Update: Upon further investigation... The proposed change is not perfect. Better solution:
inline dword FoldHash(dword h)
{
	return SwapEndian32(2833151717 * h);
}

Clang. Cpp14.

uppsrc/Core/Ops.h:289:9: error: call to 'SwapEndian32' is ambiguous
        return SwapEndian32(2833151717 * h);
               ^~~~~~~~~~~~
/home/ssg/dvlp/cpp/upp/git/uppsrc/Core/Ops.h:44:15: note: candidate function
inline dword  SwapEndian32(dword v)   { __asm__("bswap %0" : "=r" (v) : "0" (v)); return v; }
              ^
/home/ssg/dvlp/cpp/upp/git/uppsrc/Core/Ops.h:45:15: note: candidate function
inline int    SwapEndian32(int v)     { __asm__("bswap %0" : "=r" (v) : "0" (v)); return v; }
              ^
1 error generated.

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51527 is a reply to message #51522]

Wed, 10 April 2019 16:48

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

cbpporter wrote on Tue, 09 April 2019 08:57

Is this an OK first artificial benchmark?

More scenarios for testing:

* Fill: insert a randomly shuffled sequence of unique keys into the table.
* Presized fill: like Fill, but first reserve enough memory for all the keys we'll insert, to prevent rehashing and reallocing during the fill process.
* Lookup: perform 100K lookups of random keys, all of which are in the table.
* Failed lookup: perform 100K lookups of random keys, none of which are in the table.
* Remove: remove a randomly chosen half of the elements from a table.
* Destruct: destroy a table and free its memory.

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51528 is a reply to message #51525]

Wed, 10 April 2019 17:09

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

cbpporter wrote on Wed, 10 April 2019 05:39

I also talked with a colleague and his hashmap version is supposedly 3x+ faster than stl, so it looks like it is very possible to greatly outperform it.

IMHO, it is impossible to create one ideal hash table which will greatly outperform STL in all possible scenarios.
Let's take a look at two scenarios.

1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.

You will need two completely different implementations in these cases.

Two more scenarios.

1. Add data once and search for data most of the time.
2. Add/remove data most of the time and search for it occasionally.

Again, you will need two completely different implementations.

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51529 is a reply to message #51528]

Wed, 10 April 2019 17:37

mirek is currently offline

mirek
Messages: 14264
Registered: November 2005

Ultimate Member

Novo wrote on Wed, 10 April 2019 17:09

cbpporter wrote on Wed, 10 April 2019 05:39
I also talked with a colleague and his hashmap version is supposedly 3x+ faster than stl, so it looks like it is very possible to greatly outperform it.

IMHO, it is impossible to create one ideal hash table which will greatly outperform STL in all possible scenarios.

That is probably true, but mostly because at some point the limiting factor becomes cache / memory performance.

Quote:

Let's take a look at two scenarios.

1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.

You will need two completely different implementations in these cases.

That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).

Quote:

1. Add data once and search for data most of the time.
2. Add/remove data most of the time and search for it occasionally.

Ditto.

About the only thing that is in question is how to deal with collisions. Some advanced hashmaps might e.g. use binary trees to resolve collisions. I believe that it is not an overal gain (and the fact that it is not widely used in industry makes it likely) and that we should rather invest time to investigate proper hashing techniques.

Anyway, the real benchmark would be to create real world scenario and test there. IMO U++ Index/VectorMap wins that easily.

Mirek

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51530 is a reply to message #51526]

Wed, 10 April 2019 17:50

mirek is currently offline

mirek
Messages: 14264
Registered: November 2005

Ultimate Member

[quote title=Novo wrote on Wed, 10 April 2019 16:13]mirek wrote on Wed, 10 April 2019 05:07

Update: Upon further investigation... The proposed change is not perfect. Better solution:
inline dword FoldHash(dword h)
{
	return SwapEndian32(2833151717 * h);
}

OK, seems I have run out of dword range accidentally here... Smile

Smile

inline dword FoldHash(dword h)
{
	return SwapEndian32(0xa3613c16 * h);
}

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51546 is a reply to message #51529]

Sat, 13 April 2019 20:58

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

mirek wrote on Wed, 10 April 2019 11:37

That is probably true, but mostly because at some point the limiting factor becomes cache / memory performance.

There are ways to deal with this. From the top of my head: Cache-Conscious Data Structures, Cache-oblivious algorithms.

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51547 is a reply to message #51529]

Sat, 13 April 2019 21:07

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

mirek wrote on Wed, 10 April 2019 11:37

Quote:

Let's take a look at two scenarios.

1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.

You will need two completely different implementations in these cases.

That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).

IMHO, this is impossible because in the first case your main concern is the memory usage. Tiny overhead multiplied by million is a huge problem. And in the second case performance is the main issue.

Regards,
Novo

Report message to a moderator

Send a private message to this user

Re: Map implementation [message #51548 is a reply to message #51529]

Sat, 13 April 2019 21:25

Novo is currently offline

Novo
Messages: 1430
Registered: December 2006

Ultimate Contributor

mirek wrote on Wed, 10 April 2019 11:37

About the only thing that is in question is how to deal with collisions. Some advanced hashmaps might e.g. use binary trees to resolve collisions. I believe that it is not an overal gain (and the fact that it is not widely used in industry makes it likely) and that we should rather invest time to investigate proper hashing techniques.

Anyway, the real benchmark would be to create real world scenario and test there. IMO U++ Index/VectorMap wins that easily.

AFAIK, STL can't use open addressing or other such techniques because it is specified to maintain stable key/value addresses. U++ Index doesn't support that. This is why it is possible to make it faster.

About collisions. There are many ways to deal with them. A classic approach is to store chain either explicitly (a list) or implicitly (you know how to calculate address of the next colliding slot). I saw a paper from Microsoft recommending an overflow area ...

Basically, what I want to say is that there is a million of different ways to design a hash table.
IMHO, the best way is to split it into multiple policies, so you can easily replace one part of it without rewriting the whole data structure. Smile

Smile

I personally couldn't figure out how to do that. Smile

Smile

Regards,
Novo

Report message to a moderator

Send a private message to this user

Pages (2): [1 2 › »]

Switch to threaded view of this topic

Create a new topic

Submit Reply

Previous Topic:	[Skylark][witz] How would you do this
Next Topic:	[SOLVED]Sharing Ptr of object to dll

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

PDF

]

Current Time: Thu Jul 03 00:08:01 CEST 2025

Total time taken to generate the page: 0.03581 seconds