Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Community » Coffee corner » Optimizing svo_memeq -- just for curiosity
Optimizing svo_memeq -- just for curiosity [message #42267] Mon, 03 March 2014 16:32 Go to next message
Tom1
Messages: 956
Registered: March 2007
Experienced Contributor
Hi,

I tinkered a bit with svo_memeq (svo_memeq_t below) and found that simplifying the code a bit may improve performance dramatically when compiled with MSC9/MSC10 Speed -build mode:
template <class tchar>
force_inline bool svo_memeq_t(const tchar *a, const tchar *b, int len){
	return !len-- ? true : *a++!=*b++ ? false : svo_memeq_t(a,b,len);
}


Short lengths can get an about five or six fold improvement and bigger lengths (over 12) are even better. (Anyway, this what I found on Windows on an Intel processor.)

OK, this is recursive and stack can't handle unlimited comparison lengths, so this can't replace the original code as is. So, this is just for those interested in how compilers' optimization work.

Best regards,

Tom
Re: Optimizing svo_memeq -- just for curiosity [message #42273 is a reply to message #42267] Tue, 04 March 2014 08:03 Go to previous message
mirek is currently offline  mirek
Messages: 13442
Registered: November 2005
Ultimate Member
Tom1 wrote on Mon, 03 March 2014 10:32

Hi,

I tinkered a bit with svo_memeq (svo_memeq_t below) and found that simplifying the code a bit may improve performance dramatically when compiled with MSC9/MSC10 Speed -build mode:
template <class tchar>
force_inline bool svo_memeq_t(const tchar *a, const tchar *b, int len){
	return !len-- ? true : *a++!=*b++ ? false : svo_memeq_t(a,b,len);
}


Short lengths can get an about five or six fold improvement and bigger lengths (over 12) are even better. (Anyway, this what I found on Windows on an Intel processor.)

OK, this is recursive and stack can't handle unlimited comparison lengths, so this can't replace the original code as is. So, this is just for those interested in how compilers' optimization work.

Best regards,

Tom


Well, I just could not stop thinking about String::Find(String)... and got some new ideas how to optimize it even more. The key information is that with intel CPUs since about 2010 (and AMD from the same era), unaligned memory access does not have performance penalty anymore (and before that, penalty is not that high, just say 50%).

Which means that with x86-64, you can compare unaligned data up to 16 bytes with just two compares...

Mirek
Previous Topic: Question about SVN tags
Next Topic: Portability of serialized files on Linux(UBUNTU) and Windows
Goto Forum:
  


Current Time: Sat Oct 23 15:37:59 CEST 2021

Total time taken to generate the page: 0.00961 seconds