Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » Writing Bits object to disk
Writing Bits object to disk [message #47390] Wed, 11 January 2017 17:07 Go to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
Hello,

I was thinking about using Bits as an efficient data structure to write my data to disk. However, I noticed that this data structure is quite closed and does not allow callers to retrieve a pointer to the internal buffer of bits and neither does it allow itself being constructed from existing buffer and alloc variables.

Would it be a good idea to allow this, or have a similar data structure that allows retrieval of the data structure? If not, why do you think so? I now manually edited some support in, to see if my efficient idea works out well!

Thanks!

crydev
Re: Writing Bits object to disk [message #47396 is a reply to message #47390] Thu, 12 January 2017 01:20 Go to previous messageGo to next message
mr_ped is currently offline  mr_ped
Messages: 777
Registered: November 2005
Location: Czech Republic - Praha
Veteran

If your target is small disk space, you should rather allocate data in memory in reasonable common sizes, semantically grouped together, and run that through zlib compression, if those data resemble at least some patterns, this should save lot more, than packing them into bits and later having headache when you will want to extend them a bit.
Re: Writing Bits object to disk [message #47400 is a reply to message #47396] Thu, 12 January 2017 09:03 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
mr_ped wrote on Thu, 12 January 2017 01:20
If your target is small disk space, you should rather allocate data in memory in reasonable common sizes, semantically grouped together, and run that through zlib compression, if those data resemble at least some patterns, this should save lot more, than packing them into bits and later having headache when you will want to extend them a bit.


For random values this would be a good idea, but not for structured data that I want to write out. I have a sorted set of memory addresses, identified by the memory page they reside in. I can reduce space by at most 32 times if I save it in bits, and at most 8 times (in x64) if I use a Vector<bool>. Compression takes too much time, I tried. Smile
Re: Writing Bits object to disk [message #47411 is a reply to message #47390] Fri, 13 January 2017 08:39 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
crydev wrote on Wed, 11 January 2017 17:07
Hello,

I was thinking about using Bits as an efficient data structure to write my data to disk. However, I noticed that this data structure is quite closed and does not allow callers to retrieve a pointer to the internal buffer of bits and neither does it allow itself being constructed from existing buffer and alloc variables.

Would it be a good idea to allow this, or have a similar data structure that allows retrieval of the data structure? If not, why do you think so? I now manually edited some support in, to see if my efficient idea works out well!

Thanks!

crydev


I guess this is valid idea. Adding to RM.

Mirek
Re: Writing Bits object to disk [message #47457 is a reply to message #47411] Wed, 18 January 2017 08:51 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
mirek wrote on Fri, 13 January 2017 08:39
crydev wrote on Wed, 11 January 2017 17:07
Hello,

I was thinking about using Bits as an efficient data structure to write my data to disk. However, I noticed that this data structure is quite closed and does not allow callers to retrieve a pointer to the internal buffer of bits and neither does it allow itself being constructed from existing buffer and alloc variables.

Would it be a good idea to allow this, or have a similar data structure that allows retrieval of the data structure? If not, why do you think so? I now manually edited some support in, to see if my efficient idea works out well!

Thanks!

crydev


I guess this is valid idea. Adding to RM.

Mirek


Great, thanks!

crydev
Re: Writing Bits object to disk [message #47926 is a reply to message #47457] Mon, 24 April 2017 19:19 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
Hello,

I changed the Bits class to expose its buffer and allocation variables, and ported my code using Vector<bool> to its equivalent using Bits. However, I realized that setting billions of bits in hot loops is very inefficient in Bits. The speedup I get from having to write 8 times less data to disk is eliminated by the slow computations.

Why is it so inefficient? Would it be a good idea to keep using Vector<bool>, and convert to Bits before writing to the file?

Thanks,

crydev
Re: Writing Bits object to disk [message #47927 is a reply to message #47926] Mon, 24 April 2017 21:08 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
crydev wrote on Mon, 24 April 2017 19:19
Hello,

I changed the Bits class to expose its buffer and allocation variables, and ported my code using Vector<bool> to its equivalent using Bits. However, I realized that setting billions of bits in hot loops is very inefficient in Bits. The speedup I get from having to write 8 times less data to disk is eliminated by the slow computations.

Why is it so inefficient? Would it be a good idea to keep using Vector<bool>, and convert to Bits before writing to the file?

Thanks,

crydev


I guess it depends on what you do. What is ratio between Vector<bool> speed and Bits?

(This is actually a sort of nice problem to optimize Smile

Mirek
Re: Writing Bits object to disk [message #47933 is a reply to message #47927] Tue, 25 April 2017 09:36 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
A ran the following code in a performance test 100,000 times.

// Original function implementing Vector<bool>.
const int VectorBoolOrBitsetTestOriginal(bool* const buffer, const bool* const rand)
{
	int x = 0;
	for (int i = 0; i < 4096; ++i)
	{
		if (rand)
		{
			++x;
		}
		buffer[i] = rand;
	}
	return x;
}

// Different function implementing Bits.
const int VectorBoolOrBitsetTestBitSet(Bits& buffer, const bool* const rand)
{
	int x = 0;
	for (int i = 0; i < 4096; ++i)
	{
		if (rand)
		{
			++x;
		}
		buffer.Set(i, rand);
	}
	return x;
}

// Different function implementing std::bitset.
const int VectorBoolOrBitsetTestStdBitSet(std::bitset<4096>& buffer, const bool* const rand)
{
	int x = 0;
	for (int i = 0; i < 4096; ++i)
	{
		if (rand)
		{
			++x;
		}
		buffer.set(i, rand);
	}
	return x;
}


The result is as follows, Bits being approximately a factor 10 slower. std::bitset already seems to be a twice as fast:

index.php?t=getfile&id=5246&private=0

crydev
  • Attachment: Capture.PNG
    (Size: 9.65KB, Downloaded 149 times)

[Updated on: Tue, 25 April 2017 09:45]

Report message to a moderator

Re: Writing Bits object to disk [message #47934 is a reply to message #47933] Tue, 25 April 2017 09:48 Go to previous messageGo to next message
dolik.rce is currently offline  dolik.rce
Messages: 1740
Registered: August 2008
Location: Czech Republic
Senior Veteran

By the way, I just read an article on very similar topic: http://www.bfilipek.com/2017/04/packing-bools.html. It might contain some useful insights, especially in the second part which was not published yet.

Best regards,
Honza
Re: Writing Bits object to disk [message #47935 is a reply to message #47933] Tue, 25 April 2017 10:31 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
crydev wrote on Tue, 25 April 2017 09:36
A ran the following code in a performance test 100,000 times.

// Original function implementing Vector<bool>.
const int VectorBoolOrBitsetTestOriginal(bool* const buffer, const bool* const rand)
{
	int x = 0;
	for (int i = 0; i < 4096; ++i)
	{
		if (rand)
		{
			++x;
		}
		buffer[i] = rand;
	}
	return x;
}

// Different function implementing Bits.
const int VectorBoolOrBitsetTestBitSet(Bits& buffer, const bool* const rand)
{
	int x = 0;
	for (int i = 0; i < 4096; ++i)
	{
		if (rand)
		{
			++x;
		}
		buffer.Set(i, rand);
	}
	return x;
}

// Different function implementing std::bitset.
const int VectorBoolOrBitsetTestStdBitSet(std::bitset<4096>& buffer, const bool* const rand)
{
	int x = 0;
	for (int i = 0; i < 4096; ++i)
	{
		if (rand)
		{
			++x;
		}
		buffer.set(i, rand);
	}
	return x;
}


The result is as follows, Bits being approximately a factor 10 slower. std::bitset already seems to be a twice as fast:

index.php?t=getfile&id=5246&private=0

crydev


The first quick observation of Bits code (after all these years) reveals that there are pretty good oportunities to optimize this. Woohoo, optimization time!

BTW, could you please post me your benchamrk package zipped? Would save me a bit of time.

Also, I am still undecided about correct interface to expose raw data. Currectly I am thinking along something like:

const byte *ReadRaw(int& count_of_bytes);
byte *WriteRaw(int count_of_bytes);


WriteRaw would make sure that there is at least count bytes available for bits (without reallocating array down).

Mirek
Re: Writing Bits object to disk [message #47936 is a reply to message #47935] Tue, 25 April 2017 10:34 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
P.S.: Are you sure that your Test code is doing what you wanted it to do? It looks to me like there should be something like *rand++ in it...
Re: Writing Bits object to disk [message #47937 is a reply to message #47936] Tue, 25 April 2017 11:37 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
mirek wrote on Tue, 25 April 2017 10:34
P.S.: Are you sure that your Test code is doing what you wanted it to do? It looks to me like there should be something like *rand++ in it...


The above testing code is a subset of my real code, where x is a value I just introduced to have a return value to show. It does not represent anything, and shouldn't be taken into account. The rand variable is a pointer to an array of random bools. I wanted to use it to create some randomness, but I didn't really succeed there. Very Happy I solely wanted to point out the runtime of the different implementations. Smile

As Bits optimization, I also figured that there is no Reserve(int) function to pre-allocate the internal buffer. I think it would be a good idea to add such, because it sharply reduces the number of reallocates necessary. I also think that setting a bitmask can be vectorized. However, that would be a task for myself, because U++ is made to be portable. Smile

Thanks,

crydev

[Updated on: Tue, 25 April 2017 11:39]

Report message to a moderator

Re: Writing Bits object to disk [message #47938 is a reply to message #47937] Tue, 25 April 2017 11:59 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
crydev wrote on Tue, 25 April 2017 11:37
mirek wrote on Tue, 25 April 2017 10:34
P.S.: Are you sure that your Test code is doing what you wanted it to do? It looks to me like there should be something like *rand++ in it...


The above testing code is a subset of my real code, where x is a value I just introduced to have a return value to show. It does not represent anything, and shouldn't be taken into account. The rand variable is a pointer to an array of random bools. I wanted to use it to create some randomness, but I didn't really succeed there. Very Happy I solely wanted to point out the runtime of the different implementations. Smile


I have no problem with 'x', but it looks to like you are setting all bits to 1 (because you are actually not reading rand values).

Mirek
Re: Writing Bits object to disk [message #47939 is a reply to message #47938] Tue, 25 April 2017 12:38 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
Actually, I think this might be the reason for value you are getting.

I am benchmarking with this:

CONSOLE_APP_MAIN
{
	Vector<bool> data;
	for(int i = 0; i < 100000; i++)
		data.Add(Random() & 1);
	
	int N = 1000;
	for(int k = 0; k < N; k++) {
		{
			RTIMING("Vector<bool>");
			Vector<bool> h;
			for(int j = 0; j < data.GetCount(); j++)
				h.At(j, false) = data[j];
		}
		{
			RTIMING("Bits");
			Bits h;
			for(int j = 0; j < data.GetCount(); j++)
				h.Set(j, data[j]);
		}
	}
}


This showed Bits to be only about 15% slower than Vector<bool>.

Then I have tried some very basic optimization (reorganize with inline Set) and suddenly, Bits are about 20% FASTER than Vector.

I hope I am not getting anything wrong...

Any ideas about the proper "raw data" interface?
Re: Writing Bits object to disk [message #47940 is a reply to message #47939] Tue, 25 April 2017 13:35 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
Thanks for your replies Mirek! I fixed my testcases, I realized I had a wrong check in my loop. The results now are:

index.php?t=getfile&id=5247&private=0

Then, I applied your optimizations, and Bits became a little faster. However, It still is not as fast as Vector<bool>.

index.php?t=getfile&id=5248&private=0

Quote:
Any ideas about the proper "raw data" interface?


I thought about making a constructor that allows construction of Bits from an existing buffer.

I also thought about vectorizing Bits set method. In theory, we could gather 16 bools, invert the bits in this bool, such that the most significant bit is 1 if the bool value is true, and 0 if it is false. Then, the _mm_movemask_epi8 intrinsic will generate an instruction that directly converts these 16 bools to a bitmask. We can also assume that 0x80 is true, for our inverted bool. Smile

crydev
  • Attachment: new.PNG
    (Size: 9.66KB, Downloaded 122 times)
  • Attachment: Capture.PNG
    (Size: 9.73KB, Downloaded 121 times)

[Updated on: Tue, 25 April 2017 13:43]

Report message to a moderator

Re: Writing Bits object to disk [message #47941 is a reply to message #47940] Tue, 25 April 2017 13:39 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
It is still weird that you are getting different numbers than me.

Could you perhaps try my benchmark?

Are you benchmarking "release" mode?

What CPU / Compiler are you using? Do you have latest theide (with FAST release mode always on)?
Re: Writing Bits object to disk [message #47942 is a reply to message #47941] Tue, 25 April 2017 14:03 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
mirek wrote on Tue, 25 April 2017 13:39
It is still weird that you are getting different numbers than me.

Could you perhaps try my benchmark?

Are you benchmarking "release" mode?

What CPU / Compiler are you using? Do you have latest theide (with FAST release mode always on)?


I updated my TheIDE to the latest version, but it did not make a difference. I am using the Visual C++ compiler from Visual Studio 2015. My CPU is a Core i7 2600k. I compiled with Release mode, and the following compiler flags: -O2 /GS- /Qvec-report:2

What is FAST release mode? I also tried your RTIMING option, but it gives me the same results as my own measurement. Smile
Re: Writing Bits object to disk [message #47943 is a reply to message #47942] Tue, 25 April 2017 16:40 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11051
Registered: November 2005
Ultimate Member
crydev wrote on Tue, 25 April 2017 14:03
mirek wrote on Tue, 25 April 2017 13:39
It is still weird that you are getting different numbers than me.

Could you perhaps try my benchmark?

Are you benchmarking "release" mode?

What CPU / Compiler are you using? Do you have latest theide (with FAST release mode always on)?


I updated my TheIDE to the latest version, but it did not make a difference. I am using the Visual C++ compiler from Visual Studio 2015. My CPU is a Core i7 2600k. I compiled with Release mode, and the following compiler flags: -O2 /GS- /Qvec-report:2

What is FAST release mode? I also tried your RTIMING option, but it gives me the same results as my own measurement. Smile


Weird the only difference seems to be CPU (i7 4771 here)...

Have you tried my benchmark as it is?

That said, even if those numbers you are getting were real, I guess it is now close to Vector<bool> anyway.
Re: Writing Bits object to disk [message #47948 is a reply to message #47943] Wed, 26 April 2017 08:27 Go to previous messageGo to next message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
mirek wrote on Tue, 25 April 2017 16:40
crydev wrote on Tue, 25 April 2017 14:03
mirek wrote on Tue, 25 April 2017 13:39
It is still weird that you are getting different numbers than me.

Could you perhaps try my benchmark?

Are you benchmarking "release" mode?

What CPU / Compiler are you using? Do you have latest theide (with FAST release mode always on)?


I updated my TheIDE to the latest version, but it did not make a difference. I am using the Visual C++ compiler from Visual Studio 2015. My CPU is a Core i7 2600k. I compiled with Release mode, and the following compiler flags: -O2 /GS- /Qvec-report:2

What is FAST release mode? I also tried your RTIMING option, but it gives me the same results as my own measurement. Smile


Weird the only difference seems to be CPU (i7 4771 here)...

Have you tried my benchmark as it is?

That said, even if those numbers you are getting were real, I guess it is now close to Vector<bool> anyway.


I haven't tried your benchmark, I could try that too. I see that Bits is coming closer to Vector<bool> in set performance, but at http://www.bfilipek.com/2017/04/packing-bools.html they provided a method that is even faster. How do you feel about a Reserve(int) function for Bits?
(Good that Upp:Bits is now faster than std::bitset btw Smile.) Since I set billions of bits, maybe a vector set method accepting 16 bools (with value 0x80) is an even bigger performance improvement.

Thanks,

crydev
Re: Writing Bits object to disk [message #47950 is a reply to message #47948] Wed, 26 April 2017 08:50 Go to previous messageGo to previous message
crydev is currently offline  crydev
Messages: 149
Registered: October 2012
Location: Netherlands
Experienced Member
Mirek,

I ran your benchmark, and I found out why I get different results! Your benchmark gives me:

TIMING Bits           : 194.98 ms - 194.98 us (195.00 ms / 1000 ), min:  0.00 ns, max:  1.00 ms, nesting: 1 - 1000
TIMING Vector<bool>   : 362.98 ms - 362.98 us (363.00 ms / 1000 ), min:  0.00 ns, max:  1.00 ms, nesting: 1 - 1000


Constructing the Vector<bool> inside the loop also slows down the whole thing by a lot. After I moved the construction out, I got:

TIMING Bits           : 183.98 ms - 183.98 us (184.00 ms / 1000 ), min:  0.00 ns, max:  1.00 ms, nesting: 1 - 1000
TIMING Vector<bool>   : 102.98 ms - 102.98 us (103.00 ms / 1000 ), min:  0.00 ns, max:  1.00 ms, nesting: 1 - 1000


I pre-allocate the Vector<bool> with the Reserve(int) function, because I know what the size of the vector is going to be. This is also the case in my production application. That's why the Vector is still faster. We need a reserve function for Bits too. Smile

crydev

[Updated on: Wed, 26 April 2017 08:54]

Report message to a moderator

Previous Topic: RegExp this'n that
Next Topic: stable sort bug.. or looks like it
Goto Forum:
  


Current Time: Sun Jun 25 22:48:12 CEST 2017

Total time taken to generate the page: 0.00539 seconds