U++ forum: Welcome to the forum

Search on this site

Search in forums

Home » Developing U++ » U++ Developers corner » What does SSE2 usage enhance?

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

What does SSE2 usage enhance? [message #39622]

Wed, 10 April 2013 13:31

crydev
Messages: 151
Registered: October 2012
Location: Netherlands

Experienced Member

A question got up to me. What exactly are the pro's of using SSE2 as flag in your build? What kind of things can it speed up if the processor supports it?

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39658 is a reply to message #39622]

Mon, 15 April 2013 18:34

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

crydev wrote on Wed, 10 April 2013 07:31

A question got up to me. What exactly are the pro's of using SSE2 as flag in your build? What kind of things can it speed up if the processor supports it?

SSE2 tells compiler to use SSE2 for FP arithmetics in 32-bit SSE2 x86 builds. (In 64-bit builds, it is alsways on).

Cons: Executable will not work on CPUs without SSE2 - SSE2 is supported since PentiumIV and Athlon 64, basically for more than 10 years now...

Mirek

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39682 is a reply to message #39658]

Wed, 17 April 2013 10:54

crydev
Messages: 151
Registered: October 2012
Location: Netherlands

Experienced Member

Thanks Mirek for your reply. Does that mean a SSE enabled version of functions as memcpy() can be used in U++? Or can I assume that by enabling SSE2 in the compiler flags automatically enables a SSE2 enabled version of memcpy? I think the Windows stock function is fairly slow.

p.s. My situation is copying blocks of 1~8 bytes very fast and very frequently in a loop. I have a chunk of memory which I loop though. Say my loop index currently is in the middle of my memory chunk, than it copies 4 bytes from the current index.

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39684 is a reply to message #39682]

Wed, 17 April 2013 11:17

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

crydev wrote on Wed, 17 April 2013 04:54

SSE2 flag is likely to have no impact on memcpy. If it is emitted as function call (not intrinsics), it is likely that the function is SSE2 optimized anyway. For intrinsics, SSE2 code is way too complicated.

The main difference is in code using FP arithmetics: without SSE2, it is using x87 FP stack, with SSE2 it is using XMM register file, which is potentially faster.

Mirek

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39685 is a reply to message #39682]

Wed, 17 April 2013 11:20

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

crydev wrote on Wed, 17 April 2013 04:54

p.s. My situation is copying blocks of 1~8 bytes very fast and very frequently in a loop. I have a chunk of memory which I loop though. Say my loop index currently is in the middle of my memory chunk, than it copies 4 bytes from the current index.

BTW, you might have a look at SVO_MEMCPY.

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39691 is a reply to message #39685]

Wed, 17 April 2013 22:04

crydev
Messages: 151
Registered: October 2012
Location: Netherlands

Experienced Member

mirek wrote on Wed, 17 April 2013 11:20

crydev wrote on Wed, 17 April 2013 04:54

BTW, you might have a look at SVO_MEMCPY.

Thanks Mirek for your suggestion. I tried it but that one is actually a lot slower than the conventional memcpy. I stepped through the ASM though and it seems that the default memcpy is already partially optimzed for SSE2.

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39694 is a reply to message #39691]

Thu, 18 April 2013 06:34

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

crydev wrote on Wed, 17 April 2013 16:04

mirek wrote on Wed, 17 April 2013 11:20

crydev wrote on Wed, 17 April 2013 04:54

BTW, you might have a look at SVO_MEMCPY.

Are you really copying just 1-8 bytes? For such small amounts and unaligned data, SSE2 is IMO meaningless.

Mirek

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39696 is a reply to message #39694]

Thu, 18 April 2013 08:33

crydev
Messages: 151
Registered: October 2012
Location: Netherlands

Experienced Member

mirek wrote on Thu, 18 April 2013 06:34

crydev wrote on Wed, 17 April 2013 16:04

mirek wrote on Wed, 17 April 2013 11:20

crydev wrote on Wed, 17 April 2013 04:54

BTW, you might have a look at SVO_MEMCPY.

Are you really copying just 1-8 bytes? For such small amounts and unaligned data, SSE2 is IMO meaningless.

Mirek

Yes it is only about small blocks of 1 ~ 8 bytes. I'll keep it at memcpy. It is good enough. Thanks Smile

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39700 is a reply to message #39696]

Thu, 18 April 2013 10:38

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

Well, I am asking because in that case SVO_MEMCPY has a problem (it should be optimization over memcpy for small blocks).

Can you show me a small code snippet?

Mirek

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39702 is a reply to message #39700]

Thu, 18 April 2013 11:43

crydev
Messages: 151
Registered: October 2012
Location: Netherlands

Experienced Member

Here is a small code snippet. If you see anything that is strange or wrong I would appreciate feedback. Smile

The memcpy line is the line where I copy, in this case, 4 bytes from the address of i in array buffer to float variable tempStore.

template<>
void MemoryScanner::ScanWorker(const MemoryRegion& region, const float& value)
{
	Byte *buffer = (Byte*)MemoryAlloc(region.MemorySize);
	if (!ReadProcessMemory(this->mOpenedProcessHandle, (void*)region.BaseAddress, buffer
		, region.MemorySize, NULL))
	{
		MemoryFree(buffer);
		return;
	}
	
	Vector<MemoryBlockBase*> localResults;
	
	for (int i = 0; i < region.MemorySize; i++)
	{
		float tempStore;
		memcpy(&tempStore, &(buffer[i]), sizeof(float));
		
		if (TemplateCompare(tempStore, value)) // WRITES TO FREED BLOCKS DETECTED
		{
			MemoryBlock<float>* mb = new MemoryBlock<float>();
			mb->Address = static_cast<unsigned int>(region.BaseAddress + i);
			mb->Size = sizeof(float);
			mb->Buffer = tempStore;
			localResults.Add(mb);
		}
	}

	MemoryFree(buffer);

	AtomicInc(this->ThreadFinishCount);

	if (localResults.GetCount() > 0)
	{
		this->AddThreadSpecificSearchResults(localResults);
	}

	this->UpdateScanningProgress(this->ThreadFinishCount);
}

Report message to a moderator

Re: What does SSE2 usage enhance? [message #39703 is a reply to message #39702]

Thu, 18 April 2013 11:58

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

I see, I have checked the assembly and memcpy gets inlined with unaligned load, which is still faster than loading 4 bytes separated like SVO_MOVE does.

So the actual code for this memcpy is

00401744  mov ecx,[eax] 
00401746  mov [esp+0x4],ecx

(SVO_MOVE is designed for small variable size).

Mirek

Report message to a moderator

Previous Topic:	What does !! in e.g. FtpClient class mean?
Next Topic:	Stripping U++

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Jul 06 07:29:33 CEST 2025

Total time taken to generate the page: 0.09801 seconds