I have implemented plenty of stuff since I'm a programmer, including many containers and even multiple small GUI toolkits (not as good as U++ of course ) but I never implemented a hash maps, let alone one that is accessibly as an array like in U++.
Can I shamelessly dissect and steal VectorMap?
Or are you aware of some other easy to learn and re-implement version of hash maps out there that will also perform more than adequately?
And how would you compare the U++ version to a more "standard" one?
Or are you aware of some other easy to learn and re-implement version of hash maps out there that will also perform more than adequately?
There is a lot to go through...
const int SEQ = 10; const int MAX = 3000000 * SEQ; const int JUMP = 100; { Index<int> ind; StopWatch ts; for (int j = 0; j < MAX; j += JUMP) for (int i = j; i < j + SEQ; i++) { ind.Add(i); //ind.Debug(Cout()); } Cout() << "U++ add " << ts.Elapsed() << "\n"; ts.Reset(); int count = 0; for (int i = 0; i < MAX; i++) if (ind.Find(i) != -1) count++; Cout() << "U++ find " << ts.Elapsed() << "\n"; Cout() << count << "\n"; } { std::set<int> ind; StopWatch ts; for (int j = 0; j < MAX; j += JUMP) for (int i = j; i < j + SEQ; i++) { ind.insert(i); //ind.Debug(Cout()); } Cout() << "set add " << ts.Elapsed() << "\n"; ts.Reset(); int count = 0; for (int i = 0; i < MAX; i++) if (ind.find(i) != ind.end()) count++; Cout() << "set find " << ts.Elapsed() << "\n"; Cout() << count << "\n"; } { std::multiset<int> ind; StopWatch ts; for (int j = 0; j < MAX; j += JUMP) for (int i = j; i < j + SEQ; i++) { ind.insert(i); //ind.Debug(Cout()); } Cout() << "multiset add " << ts.Elapsed() << "\n"; ts.Reset(); int count = 0; for (int i = 0; i < MAX; i++) if (ind.find(i) != ind.end()) count++; Cout() << "multiset find " << ts.Elapsed() << "\n"; Cout() << count << "\n"; }
inline dword FoldHash(dword h) { return h - 362437 * SwapEndian32(h); } inline int& HashBase::Maph(unsigned _hash) const { unsigned h = _hash & ~UNSIGNED_HIBIT; return map[mask & FoldHash(h)]; }
inline dword FoldHash(dword h) { return SwapEndian32(2833151717 * h); }
Update: Upon further investigation... The proposed change is not perfect. Better solution:
inline dword FoldHash(dword h) { return SwapEndian32(2833151717 * h); }
uppsrc/Core/Ops.h:289:9: error: call to 'SwapEndian32' is ambiguous return SwapEndian32(2833151717 * h); ^~~~~~~~~~~~ /home/ssg/dvlp/cpp/upp/git/uppsrc/Core/Ops.h:44:15: note: candidate function inline dword SwapEndian32(dword v) { __asm__("bswap %0" : "=r" (v) : "0" (v)); return v; } ^ /home/ssg/dvlp/cpp/upp/git/uppsrc/Core/Ops.h:45:15: note: candidate function inline int SwapEndian32(int v) { __asm__("bswap %0" : "=r" (v) : "0" (v)); return v; } ^ 1 error generated.
Is this an OK first artificial benchmark?
I also talked with a colleague and his hashmap version is supposedly 3x+ faster than stl, so it looks like it is very possible to greatly outperform it.
cbpporter wrote on Wed, 10 April 2019 05:39I also talked with a colleague and his hashmap version is supposedly 3x+ faster than stl, so it looks like it is very possible to greatly outperform it.
IMHO, it is impossible to create one ideal hash table which will greatly outperform STL in all possible scenarios.
Let's take a look at two scenarios.
1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.
You will need two completely different implementations in these cases.
1. Add data once and search for data most of the time.
2. Add/remove data most of the time and search for it occasionally.
Update: Upon further investigation... The proposed change is not perfect. Better solution:
inline dword FoldHash(dword h) { return SwapEndian32(2833151717 * h); }
inline dword FoldHash(dword h) { return SwapEndian32(0xa3613c16 * h); }
That is probably true, but mostly because at some point the limiting factor becomes cache / memory performance.
Quote:
Let's take a look at two scenarios.
1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.
You will need two completely different implementations in these cases.
That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).
About the only thing that is in question is how to deal with collisions. Some advanced hashmaps might e.g. use binary trees to resolve collisions. I believe that it is not an overal gain (and the fact that it is not widely used in industry makes it likely) and that we should rather invest time to investigate proper hashing techniques.
Anyway, the real benchmark would be to create real world scenario and test there. IMO U++ Index/VectorMap wins that easily.
mirek wrote on Wed, 10 April 2019 11:37
Quote:
Let's take a look at two scenarios.
1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.
You will need two completely different implementations in these cases.
That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).
IMHO, this is impossible because in the first case your main concern is the memory usage. Tiny overhead multiplied by million is a huge problem. And in the second case performance is the main issue.
If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.
Vector<Index<int> > v; Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; v.SetCount(v_num); Cout() << "Created " << v_num << " empty Index<int>" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; const int isize = 100; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Add(i); Cout() << "Added " << i + 1 << " elements" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; }
sizeof(Index<int>): 80 bytes Mem used: 0 Kb Created 1000000 empty Index<int> Mem used: 78128 Kb Added 1 elements Mem used: 237028 Kb Added 2 elements Mem used: 237028 Kb Added 3 elements Mem used: 237028 Kb Added 4 elements Mem used: 237028 Kb Added 5 elements Mem used: 237028 Kb Added 6 elements Mem used: 237028 Kb Added 7 elements Mem used: 237028 Kb Added 8 elements Mem used: 237028 Kb Added 9 elements Mem used: 397796 Kb Added 10 elements Mem used: 397796 Kb ... Added 99 elements Mem used: 2592740 Kb Added 100 elements Mem used: 2592740 Kb
mirek wrote on Sun, 14 April 2019 02:03If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.
How to deal with the memory hierarchy is more or less clear (Cache-Conscious Data Structures, Cache-oblivious algorithms).
What I'm trying to say is that by using a little bit more memory you can significantly improve performance. For example, you can create a bitset of unoccupied slots. That would be overkill for a small hash table.
I made a simple test.
Vector<Index<int> > v; Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; v.SetCount(v_num); Cout() << "Created " << v_num << " empty Index<int>" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; const int isize = 100; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Add(i); Cout() << "Added " << i + 1 << " elements" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; }
Result:
sizeof(Index<int>): 80 bytes Mem used: 0 Kb Created 1000000 empty Index<int> Mem used: 78128 Kb Added 1 elements Mem used: 237028 Kb Added 2 elements Mem used: 237028 Kb Added 3 elements Mem used: 237028 Kb Added 4 elements Mem used: 237028 Kb Added 5 elements Mem used: 237028 Kb Added 6 elements Mem used: 237028 Kb Added 7 elements Mem used: 237028 Kb Added 8 elements Mem used: 237028 Kb Added 9 elements Mem used: 397796 Kb Added 10 elements Mem used: 397796 Kb ... Added 99 elements Mem used: 2592740 Kb Added 100 elements Mem used: 2592740 Kb
IMHO, it is possible to do much better than this ...
Added 100 elements
Mem used: 2592740 Kb
[/code]
IMHO, it is possible to do much better than this ...
#include <Core/Core.h> #include <set> #include <vector> using namespace Upp; CONSOLE_APP_MAIN { int curMU = MemoryUsedKb(); int v_num = 1000000; std::vector< std::set<int> > v; Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; v.resize(v_num); Cout() << "Created " << v_num << " empty Index<int>" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; const int isize = 100; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].insert(i); Cout() << "Added " << i + 1 << " elements" << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; } }
Added 100 elements Mem used: 3222476 Kb
Added 100 elements Mem used: 12412392 Kb sizeof(std::unordered_set<int>) = 64
Added 100 elements Mem used: 4621148 Kb sizeof(std::unordered_set<int>) = 56
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.
Novo wrote on Tue, 16 April 2019 16:38BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.
Yeah, the only problem is that MemoryUsedKb does not work with standard allocator...
I'll try to find a memory profiler ...
Novo wrote on Tue, 16 April 2019 16:36
I'll try to find a memory profiler ...
Massif. Release conf + debug info. Index<int>
std::set<int> is using 3.8Gb ...
Well, nothing surprising there, right?
$ ./test_ht_perf Mem used: 78128 Kb Index<int> Add: 7.035 Index<int> Unlink: 12.272 Mem used: 2592740 Kb Mem used after Sweep: 3800292 Kb Mem used: 3769040 Kb std::set<int> insert: 7.381 std::set<int> erase: 4.296 Mem used: 7603920 Kb
if (true) { Vector<Index<int> > v; v.SetCount(v_num); const int isize = 100; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; TimeStop ts; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Add(i); } Cout() << "Index<int> Add: " << ts.ToString() << EOL; ts.Reset(); for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].UnlinkKey(i); } Cout() << "Index<int> Unlink: " << ts.ToString() << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Sweep(); Cout() << "Mem used after Sweep: " << MemoryUsedKb() - curMU << " Kb" << EOL; } if (true) { std::set<int>* v; v = new std::set<int>[v_num]; const int isize = 100; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; TimeStop ts; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].insert(i); } Cout() << "std::set<int> insert: " << ts.ToString() << EOL; ts.Reset(); for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].erase(i); } Cout() << "std::set<int> erase: " << ts.ToString() << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; }
mirek wrote on Wed, 17 April 2019 02:54
Well, nothing surprising there, right?
Yes, but ...
$ ./test_ht_perf Mem used: 78128 Kb Index<int> Add: 7.035 Index<int> Unlink: 12.272 Mem used: 2592740 Kb Mem used after Sweep: 3800292 Kb Mem used: 3769040 Kb std::set<int> insert: 7.381 std::set<int> erase: 4.296 Mem used: 7603920 Kb
if (true) { Vector<Index<int> > v; v.SetCount(v_num); const int isize = 100; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; TimeStop ts; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Add(i); } Cout() << "Index<int> Add: " << ts.ToString() << EOL; ts.Reset(); for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].UnlinkKey(i); } Cout() << "Index<int> Unlink: " << ts.ToString() << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Sweep(); Cout() << "Mem used after Sweep: " << MemoryUsedKb() - curMU << " Kb" << EOL; } if (true) { std::set<int>* v; v = new std::set<int>[v_num]; const int isize = 100; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; TimeStop ts; for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].insert(i); } Cout() << "std::set<int> insert: " << ts.ToString() << EOL; ts.Reset(); for (int i = 0; i < isize; ++i) { const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].erase(i); } Cout() << "std::set<int> erase: " << ts.ToString() << EOL; Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL; }
Project is attached.
std::set<int> erase is three times faster than Index<int> Unlink.
After calling Index::Sweep even more memory is used. I guess this is a problem with the allocator.
And Index invalidates pointers.
So ...
void HashBase::FinishIndex()
{
int q = link.GetCount();
link.Reserve(hash.GetAlloc()); <==== HERE
link.AddN(hash.GetCount() - q);
for(int i = q; i < hash.GetCount(); i++)
LinkTo(i, link[i], hash[i] & UNSIGNED_HIBIT ? unlinked : Mapi(i));
}
Index and allocator are now refactored, should behave better in synthetic benchmarks as is this one.
Mem used: 39064 Kb Index<int> Add: 5.691 Index<int> Unlink: 9.380 Mem used: 2959732 Kb Index<int> Sweep: 0.204 Mem used after Sweep: 2959732 Kb Index<int> Shrink: 0.122 Mem used after Shrink: 39096 Kb Mem used: 46908 Kb std::set<int> insert: 5.975 std::set<int> erase: 3.612 Mem used: 46904 Kb
#include <Core/Core.h> #include <set> using namespace Upp; CONSOLE_APP_MAIN { #ifdef _DEBUG const int v_num = 10000; #else const int v_num = 100000; #endif const int isize = 100; { Vector<Index<int> > v; v.SetCount(v_num); { RTIMING("FindAdd v_num outer"); for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].FindAdd(i); } { RTIMING("UnlinkKey v_num outer"); for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].UnlinkKey(i); } RTIMING("Sweep v_num outer"); const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Sweep(); } { Vector<Index<int> > v; v.SetCount(v_num); { RTIMING("FindAdd v_num inner"); for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].FindAdd(i); } { RTIMING("UnlinkKey v_num inner"); for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].UnlinkKey(i); } RTIMING("Sweep v_num inner"); const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Sweep(); } { std::set<int> *v = new std::set<int>[v_num]; { RTIMING("insert v_num outer"); for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].insert(i); } { RTIMING("erase v_num outer"); for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].erase(i); } delete[] v; } { std::set<int> *v = new std::set<int>[v_num]; { RTIMING("insert v_num inner"); for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].insert(i); } { RTIMING("erase v_num inner"); for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].erase(i); } delete[] v; } }
* C:\upp\out\benchmarks\MINGWx64\Index.exe 26.06.2019 11:36:42, user: cxl TIMING erase v_num inner: 480.00 ms - 480.00 ms (480.00 ms / 1 ), min: 480.00 ms, max: 480.00 ms, nesting: 0 - 1 TIMING insert v_num inner: 702.00 ms - 702.00 ms (702.00 ms / 1 ), min: 702.00 ms, max: 702.00 ms, nesting: 0 - 1 TIMING erase v_num outer: 427.00 ms - 427.00 ms (427.00 ms / 1 ), min: 427.00 ms, max: 427.00 ms, nesting: 0 - 1 TIMING insert v_num outer: 399.00 ms - 399.00 ms (399.00 ms / 1 ), min: 399.00 ms, max: 399.00 ms, nesting: 0 - 1 TIMING Sweep v_num inner: 22.00 ms - 22.00 ms (22.00 ms / 1 ), min: 22.00 ms, max: 22.00 ms, nesting: 0 - 1 TIMING UnlinkKey v_num inner: 995.00 ms - 995.00 ms (995.00 ms / 1 ), min: 995.00 ms, max: 995.00 ms, nesting: 0 - 1 TIMING FindAdd v_num inner: 683.00 ms - 683.00 ms (683.00 ms / 1 ), min: 683.00 ms, max: 683.00 ms, nesting: 0 - 1 TIMING Sweep v_num outer: 28.00 ms - 28.00 ms (28.00 ms / 1 ), min: 28.00 ms, max: 28.00 ms, nesting: 0 - 1 TIMING UnlinkKey v_num outer: 118.00 ms - 118.00 ms (118.00 ms / 1 ), min: 118.00 ms, max: 118.00 ms, nesting: 0 - 1 TIMING FindAdd v_num outer: 242.00 ms - 242.00 ms (242.00 ms / 1 ), min: 242.00 ms, max: 242.00 ms, nesting: 0 - 1
#include <Core/Core.h> #include <set> using namespace Upp; CONSOLE_APP_MAIN { #ifdef _DEBUG const int v_num = 10000; #else const int v_num = 1000; #endif const int isize = 100; const int N = 100; Vector<int> data; for(int i = 0; i < isize * v_num; i++) data.Add(Random()); for(int ii = 0; ii < N; ii++) { { Vector<Index<int> > v; v.SetCount(v_num); { RTIMING("inner FindAdd v_num"); int *s = data; for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].FindAdd(*s++); } { RTIMING("inner UnlinkKey v_num"); int *s = data; for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].UnlinkKey(*s++); } RTIMING("inner Sweep v_num"); const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Sweep(); } { Vector<Index<int> > v; v.SetCount(v_num); { RTIMING("outer FindAdd v_num"); int *s = data; for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].FindAdd(*s++); } { RTIMING("outer UnlinkKey v_num"); int *s = data; for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].UnlinkKey(*s++); } RTIMING("outer Sweep v_num"); const int jsize = v_num; for (int j = 0; j < jsize; ++j) v[j].Sweep(); } { std::set<int> *v = new std::set<int>[v_num]; { RTIMING("outer insert v_num"); int *s = data; for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].insert(*s++); } { RTIMING("outer erase v_num"); int *s = data; for (int j = 0; j < v_num; ++j) for (int i = 0; i < isize; ++i) v[j].erase(*s++); } delete[] v; } { std::set<int> *v = new std::set<int>[v_num]; { RTIMING("inner insert v_num"); int *s = data; for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].insert(*s++); } { RTIMING("inner erase v_num"); int *s = data; for (int i = 0; i < isize; ++i) for (int j = 0; j < v_num; ++j) v[j].erase(*s++); } delete[] v; } } }