Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Developing U++ » U++ Developers corner » SSE2 and SVO optimization (Painter, memcpy....)
Re: BufferPainter::Clear() optimization [message #53963 is a reply to message #53961] Mon, 18 May 2020 13:33 Go to previous messageGo to previous message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
So I kept digging and found that the reason why Fill3 performed great with CLANG and less great with MSC was that CLANG understand what I mean with that ugly array based code to fill 4 color values into the xmm register while MSC really created that 'm' array, stored 4 values into memory and then fetched them into xmm... Smile

Fixed Fill3 seems to perform well with MSC too:

void Fill3(RGBA *t, RGBA c, int len)
{
	__m128i val4 = _mm_set1_epi32(*(int*)&c);
	auto Set4 = [&](int at) { _mm_storeu_si128((__m128i *)(t + at), val4); };
	auto Set4S = [&](int at) { _mm_stream_si128((__m128i *)(t + at), val4); };
	if(len >= 16) {
		if(len > 1024*1024 / 16 && ((uintptr_t)t & 3) == 0) { // for really huge data, bypass the cache
			while((uintptr_t)t & 15) { // align to 16 bytes for SSE
				*t++ = c;
				len--;
			}
			do {
				Set4S(0);
				Set4S(4);
				Set4S(8);
				Set4S(12);
				t += 16;
				len -= 16;
			}
			while(len >= 16);
			_mm_sfence();
		}
		else
			do {
				Set4(0);
				Set4(4);
				Set4(8);
				Set4(12);
				t += 16;
				len -= 16;
			}
			while(len >= 16);
	}
	if(len & 8) {
		Set4(0);
		Set4(4);
		t += 8;
	}
	if(len & 4) {
		Set4(0);
		t += 4;
	}
	if(len & 2) {
		t[0] = t[1] = c;
		t += 2;
	}
	if(len & 1)
		t[0] = c;
}


Mirek
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Should we still care about big-endian CPUs?
Next Topic: TheIDE crash after switching package
Goto Forum:
  


Current Time: Mon May 06 03:06:11 CEST 2024

Total time taken to generate the page: 0.01894 seconds