|  |  | | | Home » U++ Library support » U++ Core » String should implement the Boyer Moore algo Goto Forum:
	|  |  
	|  |  
	| 
		
			| Re: String should implement the Boyer Moore algo [message #42221 is a reply to message #42220] | Fri, 28 February 2014 12:36   |  
			| 
				
				
					|  victorb Messages: 78
 Registered: December 2005
 Location: Nice, France
 | Member |  |  |  
	| Mirek, 
 The one implementation I've linked from FB is without pre-computations. It should be faster in any case - may be there should be a switch when looking one char with a tighter loop.
 
 Folly (the lib from Facebook) is designed to be very fast, they claim a 30x speed increase for "casual cases" FWIW.
 
 I'm really unsure if PCRE can perform as good as BM.
 
 Victor
 [Updated on: Fri, 28 February 2014 12:37] Report message to a moderator |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: String should implement the Boyer Moore algo [message #42249 is a reply to message #42246] | Sat, 01 March 2014 20:19   |  
			| 
				
				|  |  mirek Messages: 14271
 Registered: November 2005
 | Ultimate Member |  |  |  
	| Well, I could not sleep about this, so I have decided to put it to test  
 I have created "proper" BM and BMH algorithm, then started benchamrking. Loaded 120MB XML file into String, appended "Hello world" (not present in the file otherwise) and started benchmarking.
 
 At first, U++ brute force was 10 times slower than others. Analyzing it I have found that it is not particulary well optimized (as brute force), calling memcmp on each input byte. So I have added some microoptimizations and this is what I got then:
 
 
 
------------
Needle: Hello world
U++ Brute force: 127158988
  Time elapsed: 0.037
Folly: 127158988
  Time elapsed: 0.059
Boyer-Moore: 127158988
  Time elapsed: 0.082
Boyer–Moore–Horspool: 127158988
  Time elapsed: 0.049
------------
Needle: Hel
U++ Brute force: 127158988
  Time elapsed: 0.036
Folly: 127158988
  Time elapsed: 0.063
Boyer-Moore: 127158988
  Time elapsed: 0.270
Boyer–Moore–Horspool: 127158988
  Time elapsed: 0.154
 First numbers are for the whole "Hello world" search, then i have searched only for "Hel", which accidentally is not in the file too.
 
 I believe that the only real speed advantage the folly algorithm has lies in that simple loop
 
 
 
     while (i[nsize_1] != lastNeedle) {
        if (++i == iEnd) {
          // not found
          return -1;
        }
      }
 implementing de-facto brutal force approach. Any "smart" skips are not frequent enough to change anything.
 
 As for BM and BMH, it looks like the management of skips in reality and in modern CPU is too costly as compared to streamlined comparison loops.... and becomes really a burden when needle is short. Also interesting is that simpler, less clever variant BMH (BMH is in fact simplified BM) is faster... I guess complexity in the loop shows.
 
 Well, at any case, the real result of this endeavour is optimized, albeit still brute-force, String::Find - IMO not bad at all
  
 Mirek
 
 [Updated on: Sat, 01 March 2014 20:21] Report message to a moderator |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  | 
 
 
 Current Time: Sun Oct 26 08:51:51 CET 2025 
 Total time taken to generate the page: 0.02244 seconds | 
 | 
 |