Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Developing U++ » U++ Developers corner » SSE2 and SVO optimization (Painter, memcpy....)
Re: BufferPainter::Clear() optimization [message #54106 is a reply to message #54099] Mon, 01 June 2020 11:24 Go to previous messageGo to next message
omari is currently offline  omari
Messages: 276
Registered: March 2010
Experienced Member
in uint64 memhash32(const void *ptr, int len)

while(len >= 16) {

instead of
while(len >= 32) {



regards
omari.
Re: BufferPainter::Clear() optimization [message #54111 is a reply to message #54106] Mon, 01 June 2020 15:47 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
Well, that is intentional - it is not worth the effort of final combining unless there is more memory to process.

In the end, 32bit variant is for now:

hash_t memhash(const void *ptr, size_t len)
{
	const byte *s = (byte *)ptr;
	dword val = HASH32_CONST1;
	if(len >= 4) {
		if(len >= 16) {
			dword val1, val2;
			val1 = val2 = HASH32_CONST1;
			while(len >= 8) {
				val1 = HASH32_CONST2 * val1 + *(dword *)(s);
				val2 = HASH32_CONST2 * val2 + *(dword *)(s + 4);
				s += 8;
				len -= 8;
			}
			val = HASH32_CONST2 * val + val1;
			val = HASH32_CONST2 * val + val2;
		}
		const byte *e = s + len - 4;
		while(s < e) {
			val = HASH32_CONST2 * val + *(dword *)(s);
			s += 4;
		}
		return HASH32_CONST2 * val + *(dword *)(e);
	}
	if(len >= 2) {
		val = HASH32_CONST2 * val + *(word *)(s);
		val = HASH32_CONST2 * val + *(word *)(s + len - 2);
		return val;
	}
	return len ? HASH32_CONST2 * val + *s : val;
}


(I have for now reduced that to 8 bytes being processed as I am afraid about register pressure there - not enough registers in 386 ISA. Perhaps needs more testing...)
Re: BufferPainter::Clear() optimization [message #54128 is a reply to message #54057] Tue, 02 June 2020 13:59 Go to previous messageGo to next message
Tom1
Messages: 1305
Registered: March 2007
Ultimate Contributor
Hi Mirek,

What's the current status of the new BufferPainter optimizations? More specifically, the AlphaBlend variants. Are they on their way to the BufferPainter?

Best regards,

Tom
Re: BufferPainter::Clear() optimization [message #54132 is a reply to message #54128] Tue, 02 June 2020 17:43 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
Well, somehow I dug myself into more mem* (memeq*, memhash) functions and optimisations (going 64 bit hashes)... Hopefully all is done for now (except in future, I plan to do aarch64 and NEON optimizations too).

I think I will be able to return to AlphaBlend soon.

Mirek
Re: BufferPainter::Clear() optimization [message #54133 is a reply to message #54132] Tue, 02 June 2020 18:31 Go to previous messageGo to next message
Tom1
Messages: 1305
Registered: March 2007
Ultimate Contributor
Hi Mirek,

Thanks for the update. I'll stay tuned on this channel.

Best regards,

Tom
Re: BufferPainter::Clear() optimization [message #54155 is a reply to message #54133] Thu, 04 June 2020 17:23 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
SSE2 alphablending comitted. I see 10% improvements in heavily blended example. Looks like low-hanging fruits are long gone Smile

Mirek
Re: BufferPainter::Clear() optimization [message #54157 is a reply to message #54155] Thu, 04 June 2020 17:45 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
OK, that might have been a bit too pesimistic, in some other examples the speedup is noticeable. Somewhat expected thing however is that this is more in single-threaded mode, less in MT.

Note: I have added "NOSIMD" flag to make it possible to turn the new code off.

Mirek
Re: BufferPainter::Clear() optimization [message #54158 is a reply to message #54157] Thu, 04 June 2020 18:07 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 1430
Registered: December 2006
Ultimate Contributor
Problem with Mac 10.13:
/Users/ssg/.local/soft/bb-worker/worker/m-upp/build/uppsrc/Painter/AlphaBlend.h:57:2: error: use of undeclared identifier '_mm_storeu_si64'
        _mm_storeu_si64(rgba, PackRGBA(x, _mm_setzero_si128()));
        ^


Regards,
Novo
Re: BufferPainter::Clear() optimization [message #54160 is a reply to message #54155] Thu, 04 June 2020 18:48 Go to previous messageGo to next message
Tom1
Messages: 1305
Registered: March 2007
Ultimate Contributor
mirek wrote on Thu, 04 June 2020 18:23
SSE2 alphablending comitted. I see 10% improvements in heavily blended example. Looks like low-hanging fruits are long gone Smile

Mirek


Hi Mirek,

Thanks! This is a welcome improvement. When rendering complex maps with MT, I see an overall improvement of 4.. 20 % depending on the contents. None of the geometries are transparent themselves, but the edges of strokes and fills likely do benefit from this.

Having the improvement more on the ST side is nice to have as (soft) real-time processes get less disturbed by the GUI being rendered by the BufferPainter running in ST.

Thanks and best regards,

Tom
Re: BufferPainter::Clear() optimization [message #54163 is a reply to message #54158] Thu, 04 June 2020 20:20 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
Novo wrote on Thu, 04 June 2020 18:07
Problem with Mac 10.13:
/Users/ssg/.local/soft/bb-worker/worker/m-upp/build/uppsrc/Painter/AlphaBlend.h:57:2: error: use of undeclared identifier '_mm_storeu_si64'
        _mm_storeu_si64(rgba, PackRGBA(x, _mm_setzero_si128()));
        ^


Should be now, eh... workarounded.

Mirek
Re: BufferPainter::Clear() optimization [message #54220 is a reply to message #54163] Fri, 12 June 2020 12:23 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
I have finally figured out how to SSE2 optimize ImageSpan code, so we have now about 20% boost when rendering Images in Painter with bilinear interpolation...
Re: BufferPainter::Clear() optimization [message #54221 is a reply to message #54220] Fri, 12 June 2020 12:55 Go to previous messageGo to next message
Tom1
Messages: 1305
Registered: March 2007
Ultimate Contributor
Hi Mirek,

Thanks! This also seems to improve FILL_FAST speed. Was this expected?

Now when comparing between 2020.1 and this latest enhancement altogether, rendering an ImageBuffer by first clearing it and then adding a large raster image with FILL_FAST is now down at 2.8 ms from 4.4 ms! Smile

Thanks and best regards,

Tom
Re: BufferPainter::Clear() optimization [message #54222 is a reply to message #54221] Fri, 12 June 2020 16:28 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
Tom1 wrote on Fri, 12 June 2020 12:55
Hi Mirek,

Thanks! This also seems to improve FILL_FAST speed. Was this expected?


Was not quite expected, but was noticed... Looks like trivial FP solution beats integer tricks...

Mirek
Re: BufferPainter::Clear() optimization [message #54223 is a reply to message #54222] Fri, 12 June 2020 18:45 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 1430
Registered: December 2006
Ultimate Contributor
Could you please fix a compilation error on Mac? It was introduced a couple of days ago.
In file included from /Users/ssg/.local/soft/bb-worker/worker/m-upp/build/uppsrc/Core/App.cpp:4:
In file included from /usr/include/mach-o/dyld.h:31:
/usr/include/mach-o/loader.h:56:2: error: unknown type name 'cpu_type_t'; did you mean 'Upp::cpu_type_t'?
        cpu_type_t      cputype;        /* cpu specifier */
        ^
/usr/include/mach/machine.h:70:19: note: 'Upp::cpu_type_t' declared here
typedef integer_t       cpu_type_t;
                        ^

TIA


Regards,
Novo
Re: BufferPainter::Clear() optimization [message #54227 is a reply to message #54223] Sat, 13 June 2020 10:15 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
Novo wrote on Fri, 12 June 2020 18:45
Could you please fix a compilation error on Mac? It was introduced a couple of days ago.
In file included from /Users/ssg/.local/soft/bb-worker/worker/m-upp/build/uppsrc/Core/App.cpp:4:
In file included from /usr/include/mach-o/dyld.h:31:
/usr/include/mach-o/loader.h:56:2: error: unknown type name 'cpu_type_t'; did you mean 'Upp::cpu_type_t'?
        cpu_type_t      cputype;        /* cpu specifier */
        ^
/usr/include/mach/machine.h:70:19: note: 'Upp::cpu_type_t' declared here
typedef integer_t       cpu_type_t;
                        ^

TIA


Hopefully fixed, please check.
Re: BufferPainter::Clear() optimization [message #54228 is a reply to message #54227] Sat, 13 June 2020 10:33 Go to previous messageGo to next message
coolman is currently offline  coolman
Messages: 119
Registered: April 2006
Location: Czech Republic
Experienced Member
Hi,

The commit Core: Fixed compilation issue in MacOS created compilation error on Linux

lib/libDraw-lib.a(MakeCache.cpp.o): In function `Upp::SysImageRealized(Upp::Image const&)':
MakeCache.cpp:(.text._ZN3Upp16SysImageRealizedERKNS_5ImageE+0xd): undefined reference to `Upp::IsValueCacheActive()'
MakeCache.cpp:(.text._ZN3Upp16SysImageRealizedERKNS_5ImageE+0x46): undefined reference to `Upp::ValueCacheMutex'
MakeCache.cpp:(.text._ZN3Upp16SysImageRealizedERKNS_5ImageE+0x5b): undefined reference to `Upp::TheValueCache()'
lib/libDraw-lib.a(MakeCache.cpp.o): In function `Upp::SysImageReleased(Upp::Image const&)':
MakeCache.cpp:(.text._ZN3Upp16SysImageReleasedERKNS_5ImageE+0xf): undefined reference to `Upp::IsValueCacheActive()'
MakeCache.cpp:(.text._ZN3Upp16SysImageReleasedERKNS_5ImageE+0x3f): undefined reference to `Upp::ValueCacheMutex'
MakeCache.cpp:(.text._ZN3Upp16SysImageReleasedERKNS_5ImageE+0x55): undefined reference to `Upp::TheValueCache()'
lib/libDraw-lib.a(MakeCache.cpp.o): In function `Upp::SetMakeImageCacheMax(int)':
MakeCache.cpp:(.text._ZN3Upp20SetMakeImageCacheMaxEi+0xb): undefined reference to `Upp::SetupValueCache(int, int, double)'
lib/libDraw-lib.a(MakeCache.cpp.o): In function `Upp::SetMakeImageCacheSize(int)':
MakeCache.cpp:(.text._ZN3Upp21SetMakeImageCacheSizeEi+0xb): undefined reference to `Upp::SetupValueCache(int, int, double)'
lib/libDraw-lib.a(MakeCache.cpp.o): In function `Upp::SweepMkImageCache()':
MakeCache.cpp:(.text._ZN3Upp17SweepMkImageCacheEv+0x1): undefined reference to `Upp::AdjustValueCache()'
lib/libDraw-lib.a(MakeCache.cpp.o): In function `Upp::MakeImage__(Upp::ImageMaker const&, bool)':
MakeCache.cpp:(.text._ZN3Upp11MakeImage__ERKNS_10ImageMakerEb+0x25): undefined reference to `Upp::MakeValue(Upp::LRUCache<Upp::Value, Upp::String>::Maker&)'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
CMakeFiles/ide-bin.dir/build.make:326: recipe for target 'bin/ide' failed


BR, Radek
Re: BufferPainter::Clear() optimization [message #54230 is a reply to message #54227] Sat, 13 June 2020 13:07 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 1430
Registered: December 2006
Ultimate Contributor
mirek wrote on Sat, 13 June 2020 04:15

Hopefully fixed, please check.

All three platforms are broken at this time because of linking.


Regards,
Novo
Re: BufferPainter::Clear() optimization [message #54232 is a reply to message #54230] Sat, 13 June 2020 14:45 Go to previous messageGo to next message
coolman is currently offline  coolman
Messages: 119
Registered: April 2006
Location: Czech Republic
Experienced Member
Hi,

The commit Core: Fixed to compile fixed compilation for Linux

Radek
Re: BufferPainter::Clear() optimization [message #54245 is a reply to message #54232] Sun, 14 June 2020 12:45 Go to previous messageGo to next message
Didier is currently offline  Didier
Messages: 736
Registered: November 2008
Location: France
Contributor
Hello all,

While searching for info on vectorisation techniques I stumbled on this
https://godbolt.org/

this web site proposes to compile small pieces of code (on many compilers) and examine the assembler output: it is dedicated to getting the best performance out the code

This may help to get the best vectorisation code quicker and for many compilers
Re: BufferPainter::Clear() optimization [message #54246 is a reply to message #54245] Sun, 14 June 2020 14:09 Go to previous message
mirek is currently offline  mirek
Messages: 14271
Registered: November 2005
Ultimate Member
RescaleFilter now SSE2 optimised too...
Previous Topic: Should we still care about big-endian CPUs?
Next Topic: TheIDE crash after switching package
Goto Forum:
  


Current Time: Fri Oct 24 15:57:18 CEST 2025

Total time taken to generate the page: 0.07454 seconds