Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Community » Coffee corner » Map implementation
Re: Map implementation [message #51549 is a reply to message #51547] Sun, 14 April 2019 08:03 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Sat, 13 April 2019 21:07
mirek wrote on Wed, 10 April 2019 11:37

Quote:

Let's take a look at two scenarios.

1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.

You will need two completely different implementations in these cases.


That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).

IMHO, this is impossible because in the first case your main concern is the memory usage. Tiny overhead multiplied by million is a huge problem. And in the second case performance is the main issue.


If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.
Re: Map implementation [message #51552 is a reply to message #51549] Sun, 14 April 2019 18:55 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Sun, 14 April 2019 02:03
If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.

How to deal with the memory hierarchy is more or less clear (Cache-Conscious Data Structures, Cache-oblivious algorithms).
What I'm trying to say is that by using a little bit more memory you can significantly improve performance. For example, you can create a bitset of unoccupied slots. That would be overkill for a small hash table.

I made a simple test.
		Vector<Index<int> > v;
		Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		v.SetCount(v_num);
		Cout() << "Created " << v_num << " empty Index<int>" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int isize = 100;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
			Cout() << "Added " << i + 1 << " elements" << EOL;
			Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		}

Result:
sizeof(Index<int>): 80 bytes
Mem used: 0 Kb
Created 1000000 empty Index<int>
Mem used: 78128 Kb
Added 1 elements
Mem used: 237028 Kb
Added 2 elements
Mem used: 237028 Kb
Added 3 elements
Mem used: 237028 Kb
Added 4 elements
Mem used: 237028 Kb
Added 5 elements
Mem used: 237028 Kb
Added 6 elements
Mem used: 237028 Kb
Added 7 elements
Mem used: 237028 Kb
Added 8 elements
Mem used: 237028 Kb
Added 9 elements
Mem used: 397796 Kb
Added 10 elements
Mem used: 397796 Kb
...
Added 99 elements
Mem used: 2592740 Kb
Added 100 elements
Mem used: 2592740 Kb


IMHO, it is possible to do much better than this ...


Regards,
Novo
Re: Map implementation [message #51553 is a reply to message #51552] Sun, 14 April 2019 19:52 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 14 April 2019 18:55
mirek wrote on Sun, 14 April 2019 02:03
If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.

How to deal with the memory hierarchy is more or less clear (Cache-Conscious Data Structures, Cache-oblivious algorithms).
What I'm trying to say is that by using a little bit more memory you can significantly improve performance. For example, you can create a bitset of unoccupied slots. That would be overkill for a small hash table.

I made a simple test.
		Vector<Index<int> > v;
		Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		v.SetCount(v_num);
		Cout() << "Created " << v_num << " empty Index<int>" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int isize = 100;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
			Cout() << "Added " << i + 1 << " elements" << EOL;
			Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		}

Result:
sizeof(Index<int>): 80 bytes
Mem used: 0 Kb
Created 1000000 empty Index<int>
Mem used: 78128 Kb
Added 1 elements
Mem used: 237028 Kb
Added 2 elements
Mem used: 237028 Kb
Added 3 elements
Mem used: 237028 Kb
Added 4 elements
Mem used: 237028 Kb
Added 5 elements
Mem used: 237028 Kb
Added 6 elements
Mem used: 237028 Kb
Added 7 elements
Mem used: 237028 Kb
Added 8 elements
Mem used: 237028 Kb
Added 9 elements
Mem used: 397796 Kb
Added 10 elements
Mem used: 397796 Kb
...
Added 99 elements
Mem used: 2592740 Kb
Added 100 elements
Mem used: 2592740 Kb


IMHO, it is possible to do much better than this ...


Well, keep in mind that lower bound here is 400MB - that is what will cost to store keys itself. If these keys were String, which in reality is much more typical, it would be 1600MB just for values (if they are small).

Current Index overhead is on average ~20 bytes per element, which about matches what we see here. Hard to say you can do much better. E.g. typical 'collisions are linked list' implementation will use about the same number of bytes. Even open addressing will need at least 8 bytes per node if you want to have any meaningful 'payload'.

Anyway, thanks for test. You are right that you can reduce that for very specific scenarios. And next index version will probably do about 20% better.

BTW, test putting 100 * 1000000 elements into single Index will produce the same results.

Mirek

P.S.: Overall I am happy that you both started digging here as I am thinking about refactoring Index, deprecating and detuning some unsused features (ever used FindPrev? Smile in favor of those used most frequently.
Re: Map implementation [message #51558 is a reply to message #51552] Tue, 16 April 2019 12:38 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 14 April 2019 18:55

Added 100 elements
Mem used: 2592740 Kb
[/code]

IMHO, it is possible to do much better than this ...


Just for reference (Visual C++ 64 bit release):

#include <Core/Core.h>
#include <set>
#include <vector>

using namespace Upp;
		
CONSOLE_APP_MAIN
{
	int curMU = MemoryUsedKb();
	int v_num = 1000000;
	std::vector< std::set<int> > v;
	Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL;
	Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	v.resize(v_num);
	Cout() << "Created " << v_num << " empty Index<int>" << EOL;
	Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	const int isize = 100;
	for (int i = 0; i < isize; ++i) {
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].insert(i);
		Cout() << "Added " << i + 1 << " elements" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}
}


Added 100 elements
Mem used: 3222476 Kb


Replace set with unorderd_set and

Added 100 elements
Mem used: 12412392 Kb

sizeof(std::unordered_set<int>) = 64


GCC 64bits, unorderd_set
Added 100 elements
Mem used: 4621148 Kb

sizeof(std::unordered_set<int>) = 56


BTW, I am now banging my head about reducing overhead from 20 bytes to 16 per element. It looks like it might not be possible without performance degradation (in all scenarios). So I would say it is really hard to get below 20 per element.

Mirek

[Updated on: Tue, 16 April 2019 12:40]

Report message to a moderator

Re: Map implementation [message #51559 is a reply to message #51558] Tue, 16 April 2019 16:38 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.


Regards,
Novo
Re: Map implementation [message #51560 is a reply to message #51559] Tue, 16 April 2019 19:33 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Tue, 16 April 2019 16:38
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.


Yeah, the only problem is that MemoryUsedKb does not work with standard allocator...
Re: Map implementation [message #51563 is a reply to message #51560] Tue, 16 April 2019 22:36 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Tue, 16 April 2019 13:33
Novo wrote on Tue, 16 April 2019 16:38
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.


Yeah, the only problem is that MemoryUsedKb does not work with standard allocator...

I was looking at top Smile
I'll try to find a memory profiler ...


Regards,
Novo
Re: Map implementation [message #51564 is a reply to message #51563] Wed, 17 April 2019 05:40 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
Novo wrote on Tue, 16 April 2019 16:36

I'll try to find a memory profiler ...

Massif. Release conf + debug info. Index<int>
std::set<int> is using 3.8Gb ...

index.php?t=getfile&id=5802&private=0


Regards,
Novo
Re: Map implementation [message #51565 is a reply to message #51564] Wed, 17 April 2019 08:54 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Wed, 17 April 2019 05:40
Novo wrote on Tue, 16 April 2019 16:36

I'll try to find a memory profiler ...

Massif. Release conf + debug info. Index<int>
std::set<int> is using 3.8Gb ...

index.php?t=getfile&id=5802&private=0


Well, nothing surprising there, right?

Mirek
Re: Map implementation [message #51569 is a reply to message #51565] Wed, 17 April 2019 16:13 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Wed, 17 April 2019 02:54

Well, nothing surprising there, right?

Yes, but ...

$ ./test_ht_perf 
Mem used: 78128 Kb
Index<int> Add: 7.035
Index<int> Unlink: 12.272
Mem used: 2592740 Kb
Mem used after Sweep: 3800292 Kb
Mem used: 3769040 Kb
std::set<int> insert: 7.381
std::set<int> erase: 4.296
Mem used: 7603920 Kb


	if (true) {
		Vector<Index<int> > v;
		v.SetCount(v_num);
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
		}
		Cout() << "Index<int> Add: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].UnlinkKey(i);
		}
		Cout() << "Index<int> Unlink: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].Sweep();
		Cout() << "Mem used after Sweep: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}
	
	if (true) {
		std::set<int>* v;
		v = new std::set<int>[v_num];
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].insert(i);
		}
		Cout() << "std::set<int> insert: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].erase(i);
		}
		Cout() << "std::set<int> erase: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}

Project is attached.

std::set<int> erase is three times faster than Index<int> Unlink.
After calling Index::Sweep even more memory is used. I guess this is a problem with the allocator.
And Index invalidates pointers.
So ...



Regards,
Novo
Re: Map implementation [message #51570 is a reply to message #51569] Wed, 17 April 2019 18:39 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Wed, 17 April 2019 16:13
mirek wrote on Wed, 17 April 2019 02:54

Well, nothing surprising there, right?

Yes, but ...

$ ./test_ht_perf 
Mem used: 78128 Kb
Index<int> Add: 7.035
Index<int> Unlink: 12.272
Mem used: 2592740 Kb
Mem used after Sweep: 3800292 Kb
Mem used: 3769040 Kb
std::set<int> insert: 7.381
std::set<int> erase: 4.296
Mem used: 7603920 Kb


	if (true) {
		Vector<Index<int> > v;
		v.SetCount(v_num);
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
		}
		Cout() << "Index<int> Add: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].UnlinkKey(i);
		}
		Cout() << "Index<int> Unlink: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].Sweep();
		Cout() << "Mem used after Sweep: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}
	
	if (true) {
		std::set<int>* v;
		v = new std::set<int>[v_num];
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].insert(i);
		}
		Cout() << "std::set<int> insert: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].erase(i);
		}
		Cout() << "std::set<int> erase: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}

Project is attached.

std::set<int> erase is three times faster than Index<int> Unlink.
After calling Index::Sweep even more memory is used. I guess this is a problem with the allocator.
And Index invalidates pointers.
So ...




Cool intresting catch.

Took me a while digging into the memory issue and it is really interesting - it looks like the problem is in this line

Quote:

void HashBase::FinishIndex()
{
int q = link.GetCount();
link.Reserve(hash.GetAlloc()); <==== HERE
link.AddN(hash.GetCount() - q);
for(int i = q; i < hash.GetCount(); i++)
LinkTo(i, link[i], hash[i] & UNSIGNED_HIBIT ? unlinked : Mapi(i));
}


If you comment it out, all works fine. The reason is that at that point, 'overreservation' of link in this particular leads to going from small blocks to large ones. Those small ones then are left free for further use, which in this example never materializes.

I will definitely keep this scenario for testing with the new Index... That said, this really is not likely to happen in real app.

Going to look into Unlink issue now.
Re: Map implementation [message #51813 is a reply to message #51570] Fri, 07 June 2019 13:58 Go to previous message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Index and allocator are now refactored, should behave better in synthetic benchmarks as is this one.
Previous Topic: [Skylark][witz] How would you do this
Goto Forum:
  


Current Time: Thu Jun 20 07:03:20 CEST 2019

Total time taken to generate the page: 0.02085 seconds