Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » is memsetex really at optimal speed?
is memsetex really at optimal speed? [message #60344] Tue, 19 December 2023 02:18 Go to next message
piotr5 is currently offline  piotr5
Messages: 107
Registered: November 2005
Experienced Member
it seems this function just repeats copying the object piece by piece. on the plus-side, that's quite "atomic", no object remains half-copied for long. but I'm not sure the speed really scales for initializing larger data-collections.

what I imagined it should do is something like this (I'm hereby giving permission to use the code below):
inline
void memsetyx(void *t, const void *item, int item_size, int count) {
	ASSERT(item_size >= 0);
	if(count<3||count*item_size<64){memsetex(t,item,item_size,count);return;}
	byte *q = (byte *)t;
	byte *tt=q;
	while(q-tt<64){
		memcpy8(q, item, item_size);
		q+=item_size;
		--count;
	}
	memcpy8(q,tt,qword(count)*item_size);
	memcpy128(tt+item_size*count-16,q-16,1);
}

(where the last line could have been avoided if memcpy8__ would perform the
Copy128(len - 16);
right before the return-statement underneath and at the end.)

haven't tested it though, would that work? is it faster on various platforms? maybe use a bigger constants in the if-statement at the beginning? afaik standard memcpy does allow for source region and destination-region overlapping, am I wrong? admittedly it is rarely needed to initialize an array with lots of copies of complicated objects, but in some prototype-code I could imagine it would happen. in production-code such things likely get optimized out by the programmers though. so this really is not a request to change anything. just asking if that was ever considered and how the discussion went...

[Updated on: Tue, 19 December 2023 02:26]

Report message to a moderator

Re: is memsetex really at optimal speed? [message #60410 is a reply to message #60344] Thu, 04 January 2024 19:25 Go to previous message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
piotr5 wrote on Tue, 19 December 2023 02:18
it seems this function just repeats copying the object piece by piece. on the plus-side, that's quite "atomic", no object remains half-copied for long. but I'm not sure the speed really scales for initializing larger data-collections.

what I imagined it should do is something like this (I'm hereby giving permission to use the code below):
inline
void memsetyx(void *t, const void *item, int item_size, int count) {
	ASSERT(item_size >= 0);
	if(count<3||count*item_size<64){memsetex(t,item,item_size,count);return;}
	byte *q = (byte *)t;
	byte *tt=q;
	while(q-tt<64){
		memcpy8(q, item, item_size);
		q+=item_size;
		--count;
	}
	memcpy8(q,tt,qword(count)*item_size);
	memcpy128(tt+item_size*count-16,q-16,1);
}

(where the last line could have been avoided if memcpy8__ would perform the
Copy128(len - 16);
right before the return-statement underneath and at the end.)

haven't tested it though, would that work? is it faster on various platforms? maybe use a bigger constants in the if-statement at the beginning? afaik standard memcpy does allow for source region and destination-region overlapping, am I wrong? admittedly it is rarely needed to initialize an array with lots of copies of complicated objects, but in some prototype-code I could imagine it would happen. in production-code such things likely get optimized out by the programmers though. so this really is not a request to change anything. just asking if that was ever considered and how the discussion went...


You can never tell before you benchmark it... did you?

I remember doing something like that in the past, I believe it was not worth it. I might be wrong....

Mirek
Previous Topic: [FIXED] GLDraw,GLCtrl would be nice to work properly
Next Topic: plugin/Zip issue with UTF-8
Goto Forum:
  


Current Time: Sun Apr 28 10:09:50 CEST 2024

Total time taken to generate the page: 0.03758 seconds