U++ framework
Do not panic. Ask here before giving up.

Home » Developing U++ » U++ Developers corner » SSE2 and SVO optimization (Painter, memcpy....)
Re: BufferPainter::Clear() optimization [message #53962 is a reply to message #53961] Mon, 18 May 2020 13:31 Go to previous messageGo to previous message
Tom1
Messages: 1319
Registered: March 2007
Ultimate Contributor
Hi,

Alignment corrected. (Caused obviously a lot of rearranging things to obtain balance.) Threshold is still at 8M, but feel free to experiment.

inline void new_memset128(void *b, dword data, int len){
	switch(len){
		case 5: ((dword *)b)[4] = data;
		case 4: ((dword *)b)[3] = data;
		case 3: ((dword *)b)[2] = data;
		case 2: ((dword *)b)[1] = data;
		case 1: ((dword *)b)[0] = data;
		case 0: return;
	}
	
	__m128i q = _mm_set1_epi32(*(int*)&data);
	__m128i *w = (__m128i*)b;
	__m128i *e = (__m128i*)b + (len>>2);

	if(len <= 2*1024*1024 || ((uintptr_t)b&3)){
		while(w<e-1){
			_mm_storeu_si128(w++, q);
			_mm_storeu_si128(w++, q);
		}
		if(w<e) _mm_storeu_si128(w++, q);
	}
	else{
		int s=(-((int)((uintptr_t)b)>>2))&0x3;
		switch(s){
			case 3: ((dword *)b)[2] = data;
			case 2: ((dword *)b)[1] = data;
			case 1: ((dword *)b)[0] = data;
		}
		
		w = (__m128i*) ((dword*)b + s);
		
		while(w<e) _mm_stream_si128(w++, q);
		_mm_sfence();
	}

	switch(len&3){
		case 3: ((dword *)b)[len-3] = data;
		case 2: ((dword *)b)[len-2] = data;
		case 1: ((dword *)b)[len-1] = data;
	}
}


Best regards,

Tom
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Should we still care about big-endian CPUs?
Next Topic: TheIDE crash after switching package
Goto Forum:
  


Current Time: Wed May 13 03:22:11 GMT+2 2026

Total time taken to generate the page: 0.01355 seconds