Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Developing U++ » U++ Developers corner » SSE2 and SVO optimization (Painter, memcpy....)
Re: BufferPainter::Clear() optimization [message #53953 is a reply to message #53952] Sun, 17 May 2020 20:56 Go to previous messageGo to previous message
Tom1
Messages: 1212
Registered: March 2007
Senior Contributor
Hi Mirek,

Here are my results:

CLANG

TIMING HUGE memset    : 27.28 ms -  1.36 ms (27.29 ms / 20 ), min:  1.25 ms, max:  1.66 ms, nesting: 0 - 20
TIMING HUGE Fill3     : 35.01 ms -  1.75 ms (35.01 ms / 20 ), min:  1.63 ms, max:  1.99 ms, nesting: 0 - 20
TIMING HUGE Fill      : 73.74 ms -  3.69 ms (73.75 ms / 20 ), min:  3.32 ms, max:  7.43 ms, nesting: 0 - 20
TIMING HUGE memsetd   : 72.88 ms -  3.64 ms (72.88 ms / 20 ), min:  3.40 ms, max:  4.51 ms, nesting: 0 - 20
TIMING memset         :  1.01 s  -  1.01 us ( 1.07 s  / 1000000 ), min:  1.00 us, max: 28.00 us, nesting: 0 - 1000000
TIMING Fill3          : 505.44 ms - 505.44 ns (565.98 ms / 1000000 ), min:  0.00 ns, max: 29.00 us, nesting: 0 - 1000000
TIMING Fill2          : 497.06 ms - 497.06 ns (557.61 ms / 1000000 ), min:  0.00 ns, max: 28.00 us, nesting: 0 - 1000000
TIMING Fill0          : 772.53 ms - 772.53 ns (833.07 ms / 1000000 ), min:  0.00 ns, max: 63.00 us, nesting: 0 - 1000000
TIMING Fill           :  1.67 s  -  1.67 us ( 1.73 s  / 1000000 ), min:  1.00 us, max: 58.00 us, nesting: 0 - 1000000
TIMING memsetd        : 495.67 ms - 495.67 ns (556.22 ms / 1000000 ), min:  0.00 ns, max: 28.00 us, nesting: 0 - 1000000

CLANGx64

TIMING HUGE memset    : 27.76 ms -  1.39 ms (27.76 ms / 20 ), min:  1.28 ms, max:  1.80 ms, nesting: 0 - 20
TIMING HUGE Fill3     : 36.31 ms -  1.82 ms (36.31 ms / 20 ), min:  1.59 ms, max:  2.27 ms, nesting: 0 - 20
TIMING HUGE Fill      : 73.42 ms -  3.67 ms (73.42 ms / 20 ), min:  3.41 ms, max:  4.74 ms, nesting: 0 - 20
TIMING HUGE memsetd   : 74.52 ms -  3.73 ms (74.52 ms / 20 ), min:  3.47 ms, max:  4.22 ms, nesting: 0 - 20
TIMING memset         : 898.49 ms - 898.49 ns (925.83 ms / 1000000 ), min:  0.00 ns, max: 52.00 us, nesting: 0 - 1000000
TIMING Fill3          : 492.59 ms - 492.59 ns (519.92 ms / 1000000 ), min:  0.00 ns, max: 32.00 us, nesting: 0 - 1000000
TIMING Fill2          : 495.82 ms - 495.82 ns (523.15 ms / 1000000 ), min:  0.00 ns, max: 28.00 us, nesting: 0 - 1000000
TIMING Fill0          : 569.61 ms - 569.61 ns (596.95 ms / 1000000 ), min:  0.00 ns, max: 41.00 us, nesting: 0 - 1000000
TIMING Fill           : 591.56 ms - 591.56 ns (618.90 ms / 1000000 ), min:  0.00 ns, max: 30.00 us, nesting: 0 - 1000000
TIMING memsetd        : 549.04 ms - 549.04 ns (576.37 ms / 1000000 ), min:  0.00 ns, max: 65.00 us, nesting: 0 - 1000000

MSBT19

TIMING HUGE memset    : 26.51 ms -  1.33 ms (26.51 ms / 20 ), min:  1.26 ms, max:  1.49 ms, nesting: 0 - 20
TIMING HUGE Fill3     : 35.42 ms -  1.77 ms (35.42 ms / 20 ), min:  1.58 ms, max:  2.14 ms, nesting: 0 - 20
TIMING HUGE Fill      : 25.47 ms -  1.27 ms (25.48 ms / 20 ), min:  1.18 ms, max:  1.59 ms, nesting: 0 - 20
TIMING HUGE memsetd   : 25.12 ms -  1.26 ms (25.12 ms / 20 ), min:  1.15 ms, max:  1.59 ms, nesting: 0 - 20
TIMING memset         : 978.21 ms - 978.21 ns ( 1.05 s  / 1000000 ), min:  1.00 us, max: 29.00 us, nesting: 0 - 1000000
TIMING Fill3          :  1.50 s  -  1.50 us ( 1.58 s  / 1000000 ), min:  1.00 us, max: 29.00 us, nesting: 0 - 1000000
TIMING Fill2          :  1.89 s  -  1.89 us ( 1.96 s  / 1000000 ), min:  1.00 us, max: 34.00 us, nesting: 0 - 1000000
TIMING Fill0          :  2.02 s  -  2.02 us ( 2.09 s  / 1000000 ), min:  1.00 us, max: 33.00 us, nesting: 0 - 1000000
TIMING Fill           :  2.06 s  -  2.06 us ( 2.14 s  / 1000000 ), min:  1.00 us, max: 32.00 us, nesting: 0 - 1000000
TIMING memsetd        :  1.62 s  -  1.62 us ( 1.69 s  / 1000000 ), min:  1.00 us, max: 45.00 us, nesting: 0 - 1000000

MSBT19x64

TIMING HUGE memset    : 26.96 ms -  1.35 ms (26.96 ms / 20 ), min:  1.27 ms, max:  1.90 ms, nesting: 0 - 20
TIMING HUGE Fill3     : 35.07 ms -  1.75 ms (35.08 ms / 20 ), min:  1.62 ms, max:  2.02 ms, nesting: 0 - 20
TIMING HUGE Fill      : 67.09 ms -  3.35 ms (67.09 ms / 20 ), min:  3.17 ms, max:  3.60 ms, nesting: 0 - 20
TIMING HUGE memsetd   : 25.64 ms -  1.28 ms (25.64 ms / 20 ), min:  1.19 ms, max:  1.48 ms, nesting: 0 - 20
TIMING memset         : 818.75 ms - 818.75 ns (856.11 ms / 1000000 ), min:  0.00 ns, max: 31.00 us, nesting: 0 - 1000000
TIMING Fill3          :  1.36 s  -  1.36 us ( 1.40 s  / 1000000 ), min:  1.00 us, max: 31.00 us, nesting: 0 - 1000000
TIMING Fill2          :  1.67 s  -  1.67 us ( 1.70 s  / 1000000 ), min:  1.00 us, max: 30.00 us, nesting: 0 - 1000000
TIMING Fill0          :  1.66 s  -  1.66 us ( 1.70 s  / 1000000 ), min:  1.00 us, max: 46.00 us, nesting: 0 - 1000000
TIMING Fill           :  1.68 s  -  1.68 us ( 1.72 s  / 1000000 ), min:  1.00 us, max: 50.00 us, nesting: 0 - 1000000
TIMING memsetd        :  1.50 s  -  1.50 us ( 1.54 s  / 1000000 ), min:  1.00 us, max: 29.00 us, nesting: 0 - 1000000


Fill3 is generally the best, but I experience two issues behind the scenes of this benchmark:

1. On MSBT19 / MSBT19x64 there is a significant penalty for small counts. It results in 5 ns per call, whereas in CLANG it is about 0.8 - 1.0 ns per call.
2. On MSBT19x64 the optimal threshold size is 2M counts on my Core i7. However, interestingly the default threshold value works better with MSBT19 on the same computer.

I will continue to investigate this.

Thanks and best regards,

Tom
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Should we still care about big-endian CPUs?
Next Topic: TheIDE crash after switching package
Goto Forum:
  


Current Time: Mon May 06 10:10:11 CEST 2024

Total time taken to generate the page: 0.01990 seconds