Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Community » Coffee corner » Map implementation
Re: Map implementation [message #51549 is a reply to message #51547] Sun, 14 April 2019 08:03 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Novo wrote on Sat, 13 April 2019 21:07
mirek wrote on Wed, 10 April 2019 11:37

Quote:

Let's take a look at two scenarios.

1. One million hash tables containing one hundred records.
2. One hash table containing one hundred million records.

You will need two completely different implementations in these cases.


That might be true, however I do not see a way how to improve Index for either (I see some accumulated knowledge how to improve it for both, but that is another story).

IMHO, this is impossible because in the first case your main concern is the memory usage. Tiny overhead multiplied by million is a huge problem. And in the second case performance is the main issue.


If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.
Re: Map implementation [message #51552 is a reply to message #51549] Sun, 14 April 2019 18:55 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
mirek wrote on Sun, 14 April 2019 02:03
If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.

How to deal with the memory hierarchy is more or less clear (Cache-Conscious Data Structures, Cache-oblivious algorithms).
What I'm trying to say is that by using a little bit more memory you can significantly improve performance. For example, you can create a bitset of unoccupied slots. That would be overkill for a small hash table.

I made a simple test.
		Vector<Index<int> > v;
		Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		v.SetCount(v_num);
		Cout() << "Created " << v_num << " empty Index<int>" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int isize = 100;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
			Cout() << "Added " << i + 1 << " elements" << EOL;
			Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		}

Result:
sizeof(Index<int>): 80 bytes
Mem used: 0 Kb
Created 1000000 empty Index<int>
Mem used: 78128 Kb
Added 1 elements
Mem used: 237028 Kb
Added 2 elements
Mem used: 237028 Kb
Added 3 elements
Mem used: 237028 Kb
Added 4 elements
Mem used: 237028 Kb
Added 5 elements
Mem used: 237028 Kb
Added 6 elements
Mem used: 237028 Kb
Added 7 elements
Mem used: 237028 Kb
Added 8 elements
Mem used: 237028 Kb
Added 9 elements
Mem used: 397796 Kb
Added 10 elements
Mem used: 397796 Kb
...
Added 99 elements
Mem used: 2592740 Kb
Added 100 elements
Mem used: 2592740 Kb


IMHO, it is possible to do much better than this ...


Regards,
Novo
Re: Map implementation [message #51553 is a reply to message #51552] Sun, 14 April 2019 19:52 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 14 April 2019 18:55
mirek wrote on Sun, 14 April 2019 02:03
If the performance is the issue, then the memory is the issue too. The game starts at L1 cache size, which can correspond to hunderds of records.

How to deal with the memory hierarchy is more or less clear (Cache-Conscious Data Structures, Cache-oblivious algorithms).
What I'm trying to say is that by using a little bit more memory you can significantly improve performance. For example, you can create a bitset of unoccupied slots. That would be overkill for a small hash table.

I made a simple test.
		Vector<Index<int> > v;
		Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		v.SetCount(v_num);
		Cout() << "Created " << v_num << " empty Index<int>" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int isize = 100;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
			Cout() << "Added " << i + 1 << " elements" << EOL;
			Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		}

Result:
sizeof(Index<int>): 80 bytes
Mem used: 0 Kb
Created 1000000 empty Index<int>
Mem used: 78128 Kb
Added 1 elements
Mem used: 237028 Kb
Added 2 elements
Mem used: 237028 Kb
Added 3 elements
Mem used: 237028 Kb
Added 4 elements
Mem used: 237028 Kb
Added 5 elements
Mem used: 237028 Kb
Added 6 elements
Mem used: 237028 Kb
Added 7 elements
Mem used: 237028 Kb
Added 8 elements
Mem used: 237028 Kb
Added 9 elements
Mem used: 397796 Kb
Added 10 elements
Mem used: 397796 Kb
...
Added 99 elements
Mem used: 2592740 Kb
Added 100 elements
Mem used: 2592740 Kb


IMHO, it is possible to do much better than this ...


Well, keep in mind that lower bound here is 400MB - that is what will cost to store keys itself. If these keys were String, which in reality is much more typical, it would be 1600MB just for values (if they are small).

Current Index overhead is on average ~20 bytes per element, which about matches what we see here. Hard to say you can do much better. E.g. typical 'collisions are linked list' implementation will use about the same number of bytes. Even open addressing will need at least 8 bytes per node if you want to have any meaningful 'payload'.

Anyway, thanks for test. You are right that you can reduce that for very specific scenarios. And next index version will probably do about 20% better.

BTW, test putting 100 * 1000000 elements into single Index will produce the same results.

Mirek

P.S.: Overall I am happy that you both started digging here as I am thinking about refactoring Index, deprecating and detuning some unsused features (ever used FindPrev? Smile in favor of those used most frequently.
Re: Map implementation [message #51558 is a reply to message #51552] Tue, 16 April 2019 12:38 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 14 April 2019 18:55

Added 100 elements
Mem used: 2592740 Kb
[/code]

IMHO, it is possible to do much better than this ...


Just for reference (Visual C++ 64 bit release):

#include <Core/Core.h>
#include <set>
#include <vector>

using namespace Upp;
		
CONSOLE_APP_MAIN
{
	int curMU = MemoryUsedKb();
	int v_num = 1000000;
	std::vector< std::set<int> > v;
	Cout() << "sizeof(Index<int>): " << sizeof(Index<int>) << " bytes" << EOL;
	Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	v.resize(v_num);
	Cout() << "Created " << v_num << " empty Index<int>" << EOL;
	Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	const int isize = 100;
	for (int i = 0; i < isize; ++i) {
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].insert(i);
		Cout() << "Added " << i + 1 << " elements" << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}
}


Added 100 elements
Mem used: 3222476 Kb


Replace set with unorderd_set and

Added 100 elements
Mem used: 12412392 Kb

sizeof(std::unordered_set<int>) = 64


GCC 64bits, unorderd_set
Added 100 elements
Mem used: 4621148 Kb

sizeof(std::unordered_set<int>) = 56


BTW, I am now banging my head about reducing overhead from 20 bytes to 16 per element. It looks like it might not be possible without performance degradation (in all scenarios). So I would say it is really hard to get below 20 per element.

Mirek

[Updated on: Tue, 16 April 2019 12:40]

Report message to a moderator

Re: Map implementation [message #51559 is a reply to message #51558] Tue, 16 April 2019 16:38 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.


Regards,
Novo
Re: Map implementation [message #51560 is a reply to message #51559] Tue, 16 April 2019 19:33 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Novo wrote on Tue, 16 April 2019 16:38
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.


Yeah, the only problem is that MemoryUsedKb does not work with standard allocator...
Re: Map implementation [message #51563 is a reply to message #51560] Tue, 16 April 2019 22:36 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
mirek wrote on Tue, 16 April 2019 13:33
Novo wrote on Tue, 16 April 2019 16:38
BTW, In the project that I attached there are two configurations. A default one, which is using an U++ allocator, and a second one, which is using a standard allocator.
You should get different numbers with different configurations.


Yeah, the only problem is that MemoryUsedKb does not work with standard allocator...

I was looking at top Smile
I'll try to find a memory profiler ...


Regards,
Novo
Re: Map implementation [message #51564 is a reply to message #51563] Wed, 17 April 2019 05:40 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
Novo wrote on Tue, 16 April 2019 16:36

I'll try to find a memory profiler ...

Massif. Release conf + debug info. Index<int>
std::set<int> is using 3.8Gb ...

index.php?t=getfile&id=5802&private=0


Regards,
Novo
Re: Map implementation [message #51565 is a reply to message #51564] Wed, 17 April 2019 08:54 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Novo wrote on Wed, 17 April 2019 05:40
Novo wrote on Tue, 16 April 2019 16:36

I'll try to find a memory profiler ...

Massif. Release conf + debug info. Index<int>
std::set<int> is using 3.8Gb ...

index.php?t=getfile&id=5802&private=0


Well, nothing surprising there, right?

Mirek
Re: Map implementation [message #51569 is a reply to message #51565] Wed, 17 April 2019 16:13 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
mirek wrote on Wed, 17 April 2019 02:54

Well, nothing surprising there, right?

Yes, but ...

$ ./test_ht_perf 
Mem used: 78128 Kb
Index<int> Add: 7.035
Index<int> Unlink: 12.272
Mem used: 2592740 Kb
Mem used after Sweep: 3800292 Kb
Mem used: 3769040 Kb
std::set<int> insert: 7.381
std::set<int> erase: 4.296
Mem used: 7603920 Kb


	if (true) {
		Vector<Index<int> > v;
		v.SetCount(v_num);
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
		}
		Cout() << "Index<int> Add: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].UnlinkKey(i);
		}
		Cout() << "Index<int> Unlink: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].Sweep();
		Cout() << "Mem used after Sweep: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}
	
	if (true) {
		std::set<int>* v;
		v = new std::set<int>[v_num];
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].insert(i);
		}
		Cout() << "std::set<int> insert: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].erase(i);
		}
		Cout() << "std::set<int> erase: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}

Project is attached.

std::set<int> erase is three times faster than Index<int> Unlink.
After calling Index::Sweep even more memory is used. I guess this is a problem with the allocator.
And Index invalidates pointers.
So ...



Regards,
Novo
Re: Map implementation [message #51570 is a reply to message #51569] Wed, 17 April 2019 18:39 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Novo wrote on Wed, 17 April 2019 16:13
mirek wrote on Wed, 17 April 2019 02:54

Well, nothing surprising there, right?

Yes, but ...

$ ./test_ht_perf 
Mem used: 78128 Kb
Index<int> Add: 7.035
Index<int> Unlink: 12.272
Mem used: 2592740 Kb
Mem used after Sweep: 3800292 Kb
Mem used: 3769040 Kb
std::set<int> insert: 7.381
std::set<int> erase: 4.296
Mem used: 7603920 Kb


	if (true) {
		Vector<Index<int> > v;
		v.SetCount(v_num);
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Add(i);
		}
		Cout() << "Index<int> Add: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].UnlinkKey(i);
		}
		Cout() << "Index<int> Unlink: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].Sweep();
		Cout() << "Mem used after Sweep: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}
	
	if (true) {
		std::set<int>* v;
		v = new std::set<int>[v_num];
		const int isize = 100;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
		TimeStop ts;
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].insert(i);
		}
		Cout() << "std::set<int> insert: " << ts.ToString() << EOL;
		ts.Reset();
		for (int i = 0; i < isize; ++i) {
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].erase(i);
		}
		Cout() << "std::set<int> erase: " << ts.ToString() << EOL;
		Cout() << "Mem used: " << MemoryUsedKb() - curMU << " Kb" << EOL;
	}

Project is attached.

std::set<int> erase is three times faster than Index<int> Unlink.
After calling Index::Sweep even more memory is used. I guess this is a problem with the allocator.
And Index invalidates pointers.
So ...




Cool intresting catch.

Took me a while digging into the memory issue and it is really interesting - it looks like the problem is in this line

Quote:

void HashBase::FinishIndex()
{
int q = link.GetCount();
link.Reserve(hash.GetAlloc()); <==== HERE
link.AddN(hash.GetCount() - q);
for(int i = q; i < hash.GetCount(); i++)
LinkTo(i, link[i], hash[i] & UNSIGNED_HIBIT ? unlinked : Mapi(i));
}


If you comment it out, all works fine. The reason is that at that point, 'overreservation' of link in this particular leads to going from small blocks to large ones. Those small ones then are left free for further use, which in this example never materializes.

I will definitely keep this scenario for testing with the new Index... That said, this really is not likely to happen in real app.

Going to look into Unlink issue now.
Re: Map implementation [message #51813 is a reply to message #51570] Fri, 07 June 2019 13:58 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Index and allocator are now refactored, should behave better in synthetic benchmarks as is this one.
Re: Map implementation [message #51955 is a reply to message #51813] Tue, 25 June 2019 15:20 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
mirek wrote on Fri, 07 June 2019 07:58
Index and allocator are now refactored, should behave better in synthetic benchmarks as is this one.


It performs better, but Unlink is still ~2.5 times slower than std::set::erase.

Mem used: 39064 Kb
Index<int> Add: 5.691
Index<int> Unlink: 9.380
Mem used: 2959732 Kb
Index<int> Sweep: 0.204
Mem used after Sweep: 2959732 Kb
Index<int> Shrink: 0.122
Mem used after Shrink: 39096 Kb
Mem used: 46908 Kb
std::set<int> insert: 5.975
std::set<int> erase: 3.612
Mem used: 46904 Kb


Regards,
Novo
Re: Map implementation [message #51957 is a reply to message #51955] Wed, 26 June 2019 12:07 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Well, with benchmark constructed like this, beating set<int>::erase is accidentally hard.

Interesting things is that if you change the order of loops:

#include <Core/Core.h>
#include <set>

using namespace Upp;

CONSOLE_APP_MAIN
{
#ifdef _DEBUG
	const int v_num = 10000;
#else
	const int v_num = 100000;
#endif

	const int isize = 100;

	{
		Vector<Index<int> > v;
		v.SetCount(v_num);
		{
			RTIMING("FindAdd v_num outer");
			for (int j = 0; j < v_num; ++j)
				for (int i = 0; i < isize; ++i)
					v[j].FindAdd(i);
		}
		{
			RTIMING("UnlinkKey v_num outer");
			for (int j = 0; j < v_num; ++j)
				for (int i = 0; i < isize; ++i)
					v[j].UnlinkKey(i);
		}
		RTIMING("Sweep v_num outer");
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].Sweep();
	}
	{
		Vector<Index<int> > v;
		v.SetCount(v_num);
		{
			RTIMING("FindAdd v_num inner");
			for (int i = 0; i < isize; ++i)
				for (int j = 0; j < v_num; ++j)
					v[j].FindAdd(i);
		}
		{
			RTIMING("UnlinkKey v_num inner");
			for (int i = 0; i < isize; ++i)
				for (int j = 0; j < v_num; ++j)
					v[j].UnlinkKey(i);
		}
		RTIMING("Sweep v_num inner");
		const int jsize = v_num;
		for (int j = 0; j < jsize; ++j)
			v[j].Sweep();
	}

	{
		std::set<int> *v = new std::set<int>[v_num];
		{
			RTIMING("insert v_num outer");
			for (int j = 0; j < v_num; ++j)
				for (int i = 0; i < isize; ++i)
					v[j].insert(i);
		}
	
		{
			RTIMING("erase v_num outer");
			for (int j = 0; j < v_num; ++j)
				for (int i = 0; i < isize; ++i)
					v[j].erase(i);
		}
		delete[] v;
	}

	{
		std::set<int> *v = new std::set<int>[v_num];
		{
			RTIMING("insert v_num inner");
			for (int i = 0; i < isize; ++i)
				for (int j = 0; j < v_num; ++j)
					v[j].insert(i);
		}
	
		{
			RTIMING("erase v_num inner");
			for (int i = 0; i < isize; ++i)
				for (int j = 0; j < v_num; ++j)
					v[j].erase(i);
		}
		delete[] v;
	}
}


results are very different:

* C:\upp\out\benchmarks\MINGWx64\Index.exe 26.06.2019 11:36:42, user: cxl

TIMING erase v_num inner: 480.00 ms - 480.00 ms (480.00 ms / 1 ), min: 480.00 ms, max: 480.00 ms, nesting: 0 - 1
TIMING insert v_num inner: 702.00 ms - 702.00 ms (702.00 ms / 1 ), min: 702.00 ms, max: 702.00 ms, nesting: 0 - 1
TIMING erase v_num outer: 427.00 ms - 427.00 ms (427.00 ms / 1 ), min: 427.00 ms, max: 427.00 ms, nesting: 0 - 1
TIMING insert v_num outer: 399.00 ms - 399.00 ms (399.00 ms / 1 ), min: 399.00 ms, max: 399.00 ms, nesting: 0 - 1
TIMING Sweep v_num inner: 22.00 ms - 22.00 ms (22.00 ms / 1 ), min: 22.00 ms, max: 22.00 ms, nesting: 0 - 1
TIMING UnlinkKey v_num inner: 995.00 ms - 995.00 ms (995.00 ms / 1 ), min: 995.00 ms, max: 995.00 ms, nesting: 0 - 1
TIMING FindAdd v_num inner: 683.00 ms - 683.00 ms (683.00 ms / 1 ), min: 683.00 ms, max: 683.00 ms, nesting: 0 - 1
TIMING Sweep v_num outer: 28.00 ms - 28.00 ms (28.00 ms / 1 ), min: 28.00 ms, max: 28.00 ms, nesting: 0 - 1
TIMING UnlinkKey v_num outer: 118.00 ms - 118.00 ms (118.00 ms / 1 ), min: 118.00 ms, max: 118.00 ms, nesting: 0 - 1
TIMING FindAdd v_num outer: 242.00 ms - 242.00 ms (242.00 ms / 1 ), min: 242.00 ms, max: 242.00 ms, nesting: 0 - 1


which probably means that set<int> is more cache friendly in the original benchmark....
Re: Map implementation [message #51994 is a reply to message #51957] Mon, 01 July 2019 06:26 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 893
Registered: December 2006
Experienced Contributor
Initial profiling showed this:

index.php?t=getfile&id=5869&private=0

Almost all time of Unlink is spent in inline dword FoldHash(dword h)

Most expensive are if-statements:

if(i >= 0)
if(key[i] == k) {

That is all I can tell at the moment ...


Regards,
Novo
Re: Map implementation [message #51999 is a reply to message #51994] Mon, 01 July 2019 17:53 Go to previous message
mirek is currently offline  mirek
Messages: 12126
Registered: November 2005
Ultimate Member
Things are quite different if instead of incremental pattern you feed in random data:

#include <Core/Core.h>
#include <set>

using namespace Upp;

CONSOLE_APP_MAIN
{
#ifdef _DEBUG
	const int v_num = 10000;
#else
	const int v_num = 1000;
#endif

	const int isize = 100;
	const int N = 100;
	
	Vector<int> data;
	for(int i = 0; i < isize * v_num; i++)
		data.Add(Random());

	for(int ii = 0; ii < N; ii++) {
		{
			Vector<Index<int> > v;
			v.SetCount(v_num);
			{
				RTIMING("inner FindAdd v_num");
				int *s = data;
				for (int i = 0; i < isize; ++i)
					for (int j = 0; j < v_num; ++j)
						v[j].FindAdd(*s++);
			}
			{
				RTIMING("inner UnlinkKey v_num");
				int *s = data;
				for (int i = 0; i < isize; ++i)
					for (int j = 0; j < v_num; ++j)
						v[j].UnlinkKey(*s++);
			}
			RTIMING("inner Sweep v_num");
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Sweep();
		}
		{
			Vector<Index<int> > v;
			v.SetCount(v_num);
			{
				RTIMING("outer FindAdd v_num");
				int *s = data;
				for (int j = 0; j < v_num; ++j)
					for (int i = 0; i < isize; ++i)
						v[j].FindAdd(*s++);
			}
			{
				RTIMING("outer UnlinkKey v_num");
				int *s = data;
				for (int j = 0; j < v_num; ++j)
					for (int i = 0; i < isize; ++i)
						v[j].UnlinkKey(*s++);
			}
			RTIMING("outer Sweep v_num");
			const int jsize = v_num;
			for (int j = 0; j < jsize; ++j)
				v[j].Sweep();
		}
	
		{
			std::set<int> *v = new std::set<int>[v_num];
			{
				RTIMING("outer insert v_num");
				int *s = data;
				for (int j = 0; j < v_num; ++j)
					for (int i = 0; i < isize; ++i)
						v[j].insert(*s++);
			}
		
			{
				RTIMING("outer erase v_num");
				int *s = data;
				for (int j = 0; j < v_num; ++j)
					for (int i = 0; i < isize; ++i)
						v[j].erase(*s++);
			}
			delete[] v;
		}
	
		{
			std::set<int> *v = new std::set<int>[v_num];
			{
				RTIMING("inner insert v_num");
				int *s = data;
				for (int i = 0; i < isize; ++i)
					for (int j = 0; j < v_num; ++j)
						v[j].insert(*s++);
			}
		
			{
				RTIMING("inner erase v_num");
				int *s = data;
				for (int i = 0; i < isize; ++i)
					for (int j = 0; j < v_num; ++j)
						v[j].erase(*s++);
			}
			delete[] v;
		}
	}
}


I guess incremental data somehow favors set, my guess is that it works as accidental prefetch here...

[Updated on: Mon, 01 July 2019 17:53]

Report message to a moderator

Previous Topic: [Skylark][witz] How would you do this
Next Topic: Strange Architecture
Goto Forum:
  


Current Time: Mon Dec 16 04:04:17 CET 2019

Total time taken to generate the page: 0.01173 seconds