|
|
Home » Community » U++ community news and announcements » Painter refactored/optimized
Re: Painter refactored/optimized [message #50559 is a reply to message #50558] |
Thu, 15 November 2018 11:43 |
Tom1
Messages: 1242 Registered: March 2007
|
Senior Contributor |
|
|
Hi,
The difference is so large that it makes me wonder if ST allocates/resets any rasterizers at all on the fly?
Could the number of rendering threads be pre-selected and a sufficient number of rasterizers be pre-allocated for MT so that there would be no extra allocation/reset -penalty for re-using the same BufferPainter -- as was just introduced by BufferPainter::Create?
Best regards,
Tom
|
|
|
|
Re: Painter refactored/optimized [message #50561 is a reply to message #50560] |
Thu, 15 November 2018 12:14 |
Tom1
Messages: 1242 Registered: March 2007
|
Senior Contributor |
|
|
Hi,
You say <ms??? ... you mean below one millisecond for MT??? I get something like 16 ms for MT and 300 us for ST... What exactly are your readings?
I bet your hardware is Superb! Mine is Core i7 4790K @ 4 GHz (4C/8T). Windows 10 Professional 64 bit. Compiled with MSBT17x64.
Do you have anything this old to test with?
Best regards,
Tom
[Updated on: Thu, 15 November 2018 12:18] Report message to a moderator
|
|
|
Re: Painter refactored/optimized [message #50562 is a reply to message #50561] |
Thu, 15 November 2018 12:33 |
|
mirek
Messages: 14039 Registered: November 2005
|
Ultimate Member |
|
|
Tom1 wrote on Thu, 15 November 2018 12:14Hi,
You say <ms??? ... you mean below one millisecond for MT??? I get something like 16 ms for MT and 300 us for ST... What exactly are your readings?
I bet your hardware is Superb! Mine is Core i7 4790K @ 4 GHz (4C/8T). Windows 10 Professional 64 bit. Compiled with MSBT17x64.
Do you have anything this old to test with?
Best regards,
Tom
Nope, that is just difference in testing, sorry, I have adopted it to my development package (which is benchmarks/LionBenchmark). There I am testing by repeatedly doing the paint, with the same BufferPainter, until I spend 1 second, then compute the time based on number of renders achieved.
It is sort of similar to having single global BufferPainter.
My numbers with your example are about the same for ST and half for MT - at least, those 8 cores show up
Now if I insert some bechmarking code, it is obvious that those 8 ms in MT are spend by allocating / initializing memory...
Mirek
|
|
|
Re: Painter refactored/optimized [message #50563 is a reply to message #50562] |
Thu, 15 November 2018 12:40 |
|
mirek
Messages: 14039 Registered: November 2005
|
Ultimate Member |
|
|
OK, I have just found that I have accidentally deleted that precious initialized memory in Create. So the new version is in the trunk. Changing your example with global BufferPainter now shows some pretty significant gains:
#include <CtrlLib/CtrlLib.h>
#include <Painter/Painter.h>
using namespace Upp;
class PainterBench : public TopWindow {
public:
Painting p;
FileSel fs;
BufferPainter bpainter;
void Open(){
if(fs.ExecuteOpen("Select a painting to view")){
p.Clear();
p.Serialize(FileIn(fs.Get()));
}
}
virtual bool Key(dword key, int count){
Refresh();
switch(key){
case K_CTRL_O:
Open();
return true;
}
return false;
}
typedef PainterBench CLASSNAME;
PainterBench(){
Sizeable();
p.Serialize(FileIn("C:/xxx/PainteTest/SomeRocks.painting"));
}
virtual void Paint(Draw &draw){
int64 STtiming=0;
int64 MTtiming=0;
ImageBuffer ib(GetSize());
{
bpainter.Create(ib);
bpainter.Co(true);
bpainter.PreClipDashed();
bpainter.Clear(White());
bpainter.EvenOdd();
int64 t0=usecs();
bpainter.Paint(p);
int64 t1=usecs();
MTtiming=t1-t0;
bpainter.Finish();
}
{
bpainter.Create(ib);
bpainter.Co(false);
bpainter.PreClipDashed();
bpainter.Clear(White());
bpainter.EvenOdd();
int64 t0=usecs();
bpainter.Paint(p);
int64 t1=usecs();
STtiming=t1-t0;
bpainter.Finish();
}
SetSurface(draw,Rect(ib.GetSize()),ib,ib.GetSize(),Point(0,0));
double gain=(double)STtiming/(double)(0.1+MTtiming); // Avoid div by zero
Title(Format("Rendering MT took %lld us, ST took %lld us, MT gain is %.2f",MTtiming,STtiming,gain));
}
};
GUI_APP_MAIN
{
PainterBench().Run();
}
[Updated on: Thu, 15 November 2018 12:41] Report message to a moderator
|
|
|
|
Re: Painter refactored/optimized [message #50565 is a reply to message #50564] |
Thu, 15 November 2018 13:23 |
Tom1
Messages: 1242 Registered: March 2007
|
Senior Contributor |
|
|
Hi,
One minor issue: When in Paint with global BufferPainter and only calling bufferpainter.Create(ib); the rendered area does not change to current ib size. (E.g. After maximizing the window the bufferpainter will only render on the initial initial ib area leaving the rest white.) I need to additionally call bufferpainter.Co(true or false); to get the bufferpainter work on the current ib size.
This is not a problem for me, but maybe it would be more appropriate to handle the resizing in Create somehow.
Best regards,
Tom
[Updated on: Thu, 15 November 2018 13:26] Report message to a moderator
|
|
|
|
|
Re: Painter refactored/optimized [message #50569 is a reply to message #50567] |
Fri, 16 November 2018 10:23 |
Tom1
Messages: 1242 Registered: March 2007
|
Senior Contributor |
|
|
Hi Mirek,
While on the subject, I decided to do some testing of thread count for MT Painter. What I found was interesting: My typical map renders at roughly 250 ms with ST and 100 ms with default 10 thread MT. (On my hardware CPU_Cores() returns 8 and CoWork initializes a thread pool of 10 threads.)
So I tampered a little bit with CoWork.cpp, trying with different thread counts:
int CoWork::GetPoolSize()
{
int n = GetPool().threads.GetCount();
// return n ? n : CPU_Cores() + 2;
return n ? n : 4;
}
CoWork::Pool::Pool()
{
ASSERT(!IsWorker());
// InitThreads(CPU_Cores() + 2);
InitThreads(4);
free = NULL;
for(int i = 0; i < SCHEDULED_MAX; i++)
Free(slot[i]);
quit = false;
}
In this test I ended up with four threads which yield about same performance as 10 threads. When dropping to three threads or below, the MT gain started to fade away.
I think the optimal thread count for CoWork depends on the job's balance of CPU load and memory bandwidth. Also, the CPU and memory bus design changes this balance. As the new CPUs tend to offer a lot of cores (and concurrent threads), a simple or well optimized algorithm will easily saturate the memory channels with a reasonably small subset of cores being used. I'm not sure though, if there is much point in reducing threads (and therefore freeing cores for other tasks), if the memory bus will remain saturated anyway.
Best regards,
Tom
|
|
|
|
Re: Painter refactored/optimized [message #50571 is a reply to message #50570] |
Fri, 16 November 2018 12:57 |
Tom1
Messages: 1242 Registered: March 2007
|
Senior Contributor |
|
|
Hi,
IMO your default "CPU logical cores + 2" is a well considered compromise to keep the CPU working full time without wasting much resources. No worries.
Best regards,
Tom
|
|
|
Goto Forum:
Current Time: Sat Sep 21 00:43:50 CEST 2024
Total time taken to generate the page: 0.06890 seconds
|
|
|