Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Developing U++ » U++ Developers corner » SSE2 and SVO optimization (Painter, memcpy....)
Re: BufferPainter::Clear() optimization [message #53957 is a reply to message #53953] Sun, 17 May 2020 23:25 Go to previous messageGo to previous message
Tom1
Messages: 1303
Registered: March 2007
Ultimate Contributor
Mirek,

Please check out this one. It features better performance on MSBT19 / MSBT19x64 with low counts, and works well on CLANG/CLANGx64 too:
inline void new_memset128(void *b, dword data, int len){
	
	switch(len){
		case 4: ((dword *)b)[3] = data;
		case 3: ((dword *)b)[2] = data;
		case 2: ((dword *)b)[1] = data;
		case 1: ((dword *)b)[0] = data;
		case 0: return;
	}
	
	__m128i q = _mm_set1_epi32(*(int*)&data);
	__m128i *w = (__m128i*)b;
	
	switch(len>>2){
		default:{
			__m128i *e = (__m128i*)b + (len>>2) - 4;
			if(len <= 2*1024*1024){
				while(w<e){
					_mm_storeu_si128(w++, q);
					_mm_storeu_si128(w++, q);
					_mm_storeu_si128(w++, q);
					_mm_storeu_si128(w++, q);
				}
			}
			else{
				while(w<e){
					_mm_stream_si128(w++, q);
					_mm_stream_si128(w++, q);
					_mm_stream_si128(w++, q);
					_mm_stream_si128(w++, q);
				}
			}
		}
		case 4: _mm_storeu_si128(w++, q);
		case 3: _mm_storeu_si128(w++, q);
		case 2: _mm_storeu_si128(w++, q);
		case 1: _mm_storeu_si128(w++, q);
	}
	switch(len&3){
		case 3: ((dword *)b)[len-3] = data;
		case 2: ((dword *)b)[len-2] = data;
		case 1: ((dword *)b)[len-1] = data;
	}
}



Best regards,

Tom

EDIT: Fine tuning...

[Updated on: Sun, 17 May 2020 23:55]

Report message to a moderator

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Should we still care about big-endian CPUs?
Next Topic: TheIDE crash after switching package
Goto Forum:
  


Current Time: Sun Jun 08 00:56:58 CEST 2025

Total time taken to generate the page: 0.06728 seconds