|
|
Home » Community » Coffee corner » Map implementation
|
|
|
|
Re: Map implementation [message #51488 is a reply to message #51487] |
Tue, 02 April 2019 17:48   |
Novo
Messages: 1430 Registered: December 2006
|
Ultimate Contributor |
|
|
cbpporter wrote on Tue, 02 April 2019 08:13
There is a lot to go through...
This is actually not that hard. All aspects of the hash table design are very well described in "I Wrote The Fastest Hashtable". You just need to choose your design. The easiest way to start with is "Powers of Two", linear probing, internal chaining. "internal chaining" means that you explicitly store a pointer to the next element in a chain, and "linear probing" means that you are using linear memory scan to find first available slot.
Ideally, your hash table should be completely policy-based, so you can easily replace linear probing with quadratic probing, for example.
It is not hard to implement a table having one fixed design. What is really hard is to make it completely policy-based.
Regards,
Novo
|
|
|
|
|
Re: Map implementation [message #51521 is a reply to message #51519] |
Mon, 08 April 2019 22:37   |
 |
mirek
Messages: 14255 Registered: November 2005
|
Ultimate Member |
|
|
Yeah, that is uncessary.
Anyway, Index will probably get overhaul soon. I plan to drop unused features and slightly change the semantics to gain more performance and cleaner API and operation.
Mirek
[Updated on: Tue, 09 April 2019 09:29] Report message to a moderator
|
|
|
Re: Map implementation [message #51522 is a reply to message #51521] |
Tue, 09 April 2019 14:57   |
cbpporter
Messages: 1427 Registered: September 2007
|
Ultimate Contributor |
|
|
I re-implemented Index (copy&paste, trancode and cleanup) and should be a good starting point since it is smaller and simpler than a lot of maps out there.
I'll test mine thoroughly and benchmark, but before I benchmarked U++ vs. set vs. multiset, just to set my expectations accordingly.
U++ is pretty fast when compared to STL, but it does have exponential growth. With many items, in some tests, when multiplying items by 10, STL will go from like 250ms to 320, while Index will go from 250ms to 2 minutes .
I used this simple test:
const int SEQ = 10;
const int MAX = 3000000 * SEQ;
const int JUMP = 100;
{
Index<int> ind;
StopWatch ts;
for (int j = 0; j < MAX; j += JUMP)
for (int i = j; i < j + SEQ; i++) {
ind.Add(i);
//ind.Debug(Cout());
}
Cout() << "U++ add " << ts.Elapsed() << "\n";
ts.Reset();
int count = 0;
for (int i = 0; i < MAX; i++)
if (ind.Find(i) != -1)
count++;
Cout() << "U++ find " << ts.Elapsed() << "\n";
Cout() << count << "\n";
}
{
std::set<int> ind;
StopWatch ts;
for (int j = 0; j < MAX; j += JUMP)
for (int i = j; i < j + SEQ; i++) {
ind.insert(i);
//ind.Debug(Cout());
}
Cout() << "set add " << ts.Elapsed() << "\n";
ts.Reset();
int count = 0;
for (int i = 0; i < MAX; i++)
if (ind.find(i) != ind.end())
count++;
Cout() << "set find " << ts.Elapsed() << "\n";
Cout() << count << "\n";
}
{
std::multiset<int> ind;
StopWatch ts;
for (int j = 0; j < MAX; j += JUMP)
for (int i = j; i < j + SEQ; i++) {
ind.insert(i);
//ind.Debug(Cout());
}
Cout() << "multiset add " << ts.Elapsed() << "\n";
ts.Reset();
int count = 0;
for (int i = 0; i < MAX; i++)
if (ind.find(i) != ind.end())
count++;
Cout() << "multiset find " << ts.Elapsed() << "\n";
Cout() << count << "\n";
}
Is this an OK first artificial benchmark?
For MAX as large as the example, this is the starting point where U++ begins to loose vs. STL.
Next I need to compare memory usage...
Edit: I broke the benchmark into add and find, and the exponential growth for many items is at the find phase.
[Updated on: Tue, 09 April 2019 15:03] Report message to a moderator
|
|
|
Re: Map implementation [message #51523 is a reply to message #51522] |
Tue, 09 April 2019 17:16   |
 |
mirek
Messages: 14255 Registered: November 2005
|
Ultimate Member |
|
|
You have accidentally succeeded in attack on used hash mechanism - what you see is result of hash collisions. This is sad and something to deal with, however this would be much harder to achieve with String and quite unlikely to happen with real (aka random) data.
Still it is something I am worried about and will fix properly in the next version. For now, try to replace
inline dword FoldHash(dword h)
{
return h - 362437 * SwapEndian32(h);
}
inline int& HashBase::Maph(unsigned _hash) const
{
unsigned h = _hash & ~UNSIGNED_HIBIT;
return map[mask & FoldHash(h)];
}
(In future, I will make '362437' number random prime, which will likely make this kind of attack impossible).
Mirek
EDIT: Forgot the change in Map.h
[Updated on: Wed, 10 April 2019 00:21] Report message to a moderator
|
|
|
Re: Map implementation [message #51524 is a reply to message #51523] |
Wed, 10 April 2019 11:07   |
 |
mirek
Messages: 14255 Registered: November 2005
|
Ultimate Member |
|
|
Update: Upon further investigation... The proposed change is not perfect. Better solution:
inline dword FoldHash(dword h)
{
return SwapEndian32(2833151717 * h);
}
Interestingly, in this particular case it runs slower. It took me a couple of hours to figure out why: It is cache issue. This much better FoldHash actually spreads hashes nicely through hash space (thus causing cache misses), while the previous one tended to put them close (cache hits). It had accidentally fixed this particular benchmark's collisions, but would probably fail in some other scenario.
Conclusion: At this number of elements, the benchmark is memory bound (for int), so well working hasmap will perform similary to std::set (actually, I think the benchmark favors std::set here, as sequential numbers will repeat the same path in the binary tree, which is more cache friendly).
Original FoldHash was too simplistic for ink keys, this new version should be harder to attack.
In future version of Index, I will try to randomize FoldHash (and other hashing ops).
Mirek
|
|
|
|
Re: Map implementation [message #51526 is a reply to message #51524] |
Wed, 10 April 2019 16:13   |
Novo
Messages: 1430 Registered: December 2006
|
Ultimate Contributor |
|
|
mirek wrote on Wed, 10 April 2019 05:07Update: Upon further investigation... The proposed change is not perfect. Better solution:
inline dword FoldHash(dword h)
{
return SwapEndian32(2833151717 * h);
}
Clang. Cpp14.
uppsrc/Core/Ops.h:289:9: error: call to 'SwapEndian32' is ambiguous
return SwapEndian32(2833151717 * h);
^~~~~~~~~~~~
/home/ssg/dvlp/cpp/upp/git/uppsrc/Core/Ops.h:44:15: note: candidate function
inline dword SwapEndian32(dword v) { __asm__("bswap %0" : "=r" (v) : "0" (v)); return v; }
^
/home/ssg/dvlp/cpp/upp/git/uppsrc/Core/Ops.h:45:15: note: candidate function
inline int SwapEndian32(int v) { __asm__("bswap %0" : "=r" (v) : "0" (v)); return v; }
^
1 error generated.
Regards,
Novo
|
|
|
Re: Map implementation [message #51527 is a reply to message #51522] |
Wed, 10 April 2019 16:48   |
Novo
Messages: 1430 Registered: December 2006
|
Ultimate Contributor |
|
|
cbpporter wrote on Tue, 09 April 2019 08:57Is this an OK first artificial benchmark?
More scenarios for testing:
* Fill: insert a randomly shuffled sequence of unique keys into the table.
* Presized fill: like Fill, but first reserve enough memory for all the keys we'll insert, to prevent rehashing and reallocing during the fill process.
* Lookup: perform 100K lookups of random keys, all of which are in the table.
* Failed lookup: perform 100K lookups of random keys, none of which are in the table.
* Remove: remove a randomly chosen half of the elements from a table.
* Destruct: destroy a table and free its memory.
Regards,
Novo
|
|
|
Re: Map implementation [message #51528 is a reply to message #51525] |
Wed, 10 April 2019 17:09   |
Novo
Messages: 1430 Registered: December 2006
|
Ultimate Contributor |
|
|
cbpporter wrote on Wed, 10 April 2019 05:39I also talked with a colleague and his hashmap version is supposedly 3x+ faster than stl, so it looks like it is very possible to greatly outperform it.
IMHO, it is impossible to create one ideal hash table which will greatly outperform STL in all possible scenarios.
Let's take a look at two scenarios.
1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.
You will need two completely different implementations in these cases.
Two more scenarios.
1. Add data once and search for data most of the time.
2. Add/remove data most of the time and search for it occasionally.
Again, you will need two completely different implementations.
Regards,
Novo
|
|
|
|
|
Re: Map implementation [message #51546 is a reply to message #51529] |
Sat, 13 April 2019 20:58   |
Novo
Messages: 1430 Registered: December 2006
|
Ultimate Contributor |
|
|
mirek wrote on Wed, 10 April 2019 11:37
That is probably true, but mostly because at some point the limiting factor becomes cache / memory performance.
There are ways to deal with this. From the top of my head: Cache-Conscious Data Structures, Cache-oblivious algorithms.
Regards,
Novo
|
|
|
Re: Map implementation [message #51547 is a reply to message #51529] |
Sat, 13 April 2019 21:07   |
Novo
Messages: 1430 Registered: December 2006
|
Ultimate Contributor |
|
|
mirek wrote on Wed, 10 April 2019 11:37
Quote:
Let's take a look at two scenarios.
1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.
You will need two completely different implementations in these cases.
That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).
IMHO, this is impossible because in the first case your main concern is the memory usage. Tiny overhead multiplied by million is a huge problem. And in the second case performance is the main issue.
Regards,
Novo
|
|
|
|
Goto Forum:
Current Time: Fri Apr 25 12:00:25 CEST 2025
Total time taken to generate the page: 0.00988 seconds
|
|
|