|
|
Home » U++ Library support » U++ Core » String should implement the Boyer Moore algo
|
|
Re: String should implement the Boyer Moore algo [message #42221 is a reply to message #42220] |
Fri, 28 February 2014 12:36   |
victorb
Messages: 78 Registered: December 2005 Location: Nice, France
|
Member |
|
|
Mirek,
The one implementation I've linked from FB is without pre-computations. It should be faster in any case - may be there should be a switch when looking one char with a tighter loop.
Folly (the lib from Facebook) is designed to be very fast, they claim a 30x speed increase for "casual cases" FWIW.
I'm really unsure if PCRE can perform as good as BM.
Victor
[Updated on: Fri, 28 February 2014 12:37] Report message to a moderator
|
|
|
|
|
Re: String should implement the Boyer Moore algo [message #42249 is a reply to message #42246] |
Sat, 01 March 2014 20:19   |
 |
mirek
Messages: 14255 Registered: November 2005
|
Ultimate Member |
|
|
Well, I could not sleep about this, so I have decided to put it to test 
I have created "proper" BM and BMH algorithm, then started benchamrking. Loaded 120MB XML file into String, appended "Hello world" (not present in the file otherwise) and started benchmarking.
At first, U++ brute force was 10 times slower than others. Analyzing it I have found that it is not particulary well optimized (as brute force), calling memcmp on each input byte. So I have added some microoptimizations and this is what I got then:
------------
Needle: Hello world
U++ Brute force: 127158988
Time elapsed: 0.037
Folly: 127158988
Time elapsed: 0.059
Boyer-Moore: 127158988
Time elapsed: 0.082
Boyer–Moore–Horspool: 127158988
Time elapsed: 0.049
------------
Needle: Hel
U++ Brute force: 127158988
Time elapsed: 0.036
Folly: 127158988
Time elapsed: 0.063
Boyer-Moore: 127158988
Time elapsed: 0.270
Boyer–Moore–Horspool: 127158988
Time elapsed: 0.154
First numbers are for the whole "Hello world" search, then i have searched only for "Hel", which accidentally is not in the file too.
I believe that the only real speed advantage the folly algorithm has lies in that simple loop
while (i[nsize_1] != lastNeedle) {
if (++i == iEnd) {
// not found
return -1;
}
}
implementing de-facto brutal force approach. Any "smart" skips are not frequent enough to change anything.
As for BM and BMH, it looks like the management of skips in reality and in modern CPU is too costly as compared to streamlined comparison loops.... and becomes really a burden when needle is short. Also interesting is that simpler, less clever variant BMH (BMH is in fact simplified BM) is faster... I guess complexity in the loop shows.
Well, at any case, the real result of this endeavour is optimized, albeit still brute-force, String::Find - IMO not bad at all 
Mirek
[Updated on: Sat, 01 March 2014 20:21] Report message to a moderator
|
|
|
|
|
|
|
Goto Forum:
Current Time: Tue Apr 29 08:40:37 CEST 2025
Total time taken to generate the page: 0.03451 seconds
|
|
|