U++ forum: Welcome to the forum

Search on this site

Search in forums

Home » U++ Library support » U++ Core » String should implement the Boyer Moore algo

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

String should implement the Boyer Moore algo [message #42198]

Thu, 27 February 2014 16:05

victorb
Messages: 78
Registered: December 2005
Location: Nice, France

Member

Hi all,

I used to use and contribute to U++ but it has been a long time since I haven't even started theIDE.

It's even more impressive that it was back then... Mirek, you're a genius!

I'm more developping web apps today and I have been looking at the
Boyer-Moore algo to speed up string search. It seems that U++ uses
a brute force algo, it should be able to do much better.

Please find some reference implementation below:
https://github.com/facebook/folly/blob/master/folly/FBString .h#L1796

http://doxygen.postgresql.org/varlena_8c_source.html

Keep up the good waork guys !

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42220 is a reply to message #42198]

Fri, 28 February 2014 11:50

mirek
Messages: 14000
Registered: November 2005

Ultimate Member

Thing is, Find is meant for searching in relatively small Strings. IMO, in that case, the cost of precreating search tables in B-M algo outweights any possible gains. It also seams to me that for the same reason, such utility, which would definitely be useful in some cases, should be rather implemented as "search class" - because you can then "compile" the pattern once, then use for searching several times.

So for String::Find, let us consider the "brute force" a feature.

I would be nice to have BM as separate thing (class probably).

But another thing is that perhaps full PCRE has similar characteristics to BM but much more flexible use... (but I might be wrong about this one...)

Mirek

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42221 is a reply to message #42220]

Fri, 28 February 2014 12:36

victorb
Messages: 78
Registered: December 2005
Location: Nice, France

Member

Mirek,

The one implementation I've linked from FB is without pre-computations. It should be faster in any case - may be there should be a switch when looking one char with a tighter loop.

Folly (the lib from Facebook) is designed to be very fast, they claim a 30x speed increase for "casual cases" FWIW.

I'm really unsure if PCRE can perform as good as BM.

Victor

[Updated on: Fri, 28 February 2014 12:37]

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42223 is a reply to message #42221]

Fri, 28 February 2014 13:54

mirek
Messages: 14000
Registered: November 2005

Ultimate Member

Well, but that code is definitely not Boyer-Moore, but " Boyer-Moore-like trick " (it is in the comments Wink

Albeit it looks like quite a clever trick, fixing the most profound cases where brute force fails.

Mirek

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42246 is a reply to message #42223]

Sat, 01 March 2014 17:05

mirek
Messages: 14000
Registered: November 2005

Ultimate Member

After further analysing folly code, I am quite confused by where that 30x times speedup claim comes from.

If I can see the code right, the about only situation where any speedup compared to brute force happens is when the last character of string is matched, which is perhaps important for some corner cases, but for general case it is virtually useless...

(In contrast, _proper_ Boyer-Moore skips almost always, but well, that requires prebuilding those skip tables...)

But perhaps I am missing something?

Mirek

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42249 is a reply to message #42246]

Sat, 01 March 2014 20:19

mirek
Messages: 14000
Registered: November 2005

Ultimate Member

Well, I could not sleep about this, so I have decided to put it to test Smile

I have created "proper" BM and BMH algorithm, then started benchamrking. Loaded 120MB XML file into String, appended "Hello world" (not present in the file otherwise) and started benchmarking.

At first, U++ brute force was 10 times slower than others. Analyzing it I have found that it is not particulary well optimized (as brute force), calling memcmp on each input byte. So I have added some microoptimizations and this is what I got then:

------------
Needle: Hello world
U++ Brute force: 127158988
  Time elapsed: 0.037
Folly: 127158988
  Time elapsed: 0.059
Boyer-Moore: 127158988
  Time elapsed: 0.082
Boyer�Moore�Horspool: 127158988
  Time elapsed: 0.049
------------
Needle: Hel
U++ Brute force: 127158988
  Time elapsed: 0.036
Folly: 127158988
  Time elapsed: 0.063
Boyer-Moore: 127158988
  Time elapsed: 0.270
Boyer�Moore�Horspool: 127158988
  Time elapsed: 0.154

First numbers are for the whole "Hello world" search, then i have searched only for "Hel", which accidentally is not in the file too.

I believe that the only real speed advantage the folly algorithm has lies in that simple loop

     while (i[nsize_1] != lastNeedle) {
        if (++i == iEnd) {
          // not found
          return -1;
        }
      }

implementing de-facto brutal force approach. Any "smart" skips are not frequent enough to change anything.

As for BM and BMH, it looks like the management of skips in reality and in modern CPU is too costly as compared to streamlined comparison loops.... and becomes really a burden when needle is short. Also interesting is that simpler, less clever variant BMH (BMH is in fact simplified BM) is faster... I guess complexity in the loop shows.

Well, at any case, the real result of this endeavour is optimized, albeit still brute-force, String::Find - IMO not bad at all Smile

Mirek

[Updated on: Sat, 01 March 2014 20:21]

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42254 is a reply to message #42249]

Sun, 02 March 2014 15:36

zsolt
Messages: 698
Registered: December 2005
Location: Budapest, Hungary

Contributor

Thanks

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42256 is a reply to message #42254]

Sun, 02 March 2014 17:32

victorb
Messages: 78
Registered: December 2005
Location: Nice, France

Member

Mirek,

Great work !

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42261 is a reply to message #42198]

Sun, 02 March 2014 19:17

nlneilson
Messages: 644
Registered: January 2010
Location: U.S. California. Mojave &...

Contributor

Great!

It seemed to be fast enough before but should be even faster now.

Adding the option to not close the search window as often was great also.

Report message to a moderator

Re: String should implement the Boyer Moore algo [message #42262 is a reply to message #42261]

Sun, 02 March 2014 19:22

mirek
Messages: 14000
Registered: November 2005

Ultimate Member

Frankly, I am still wondering why "smart" Boyer-Moore is slower. It is possible that I have faulty implementation (I just adopted the code from wikipedia).

Anyone wishing to play with this is encouraged to try benchmarks/StringFind.

Mirek

Report message to a moderator

Previous Topic:	Bug: AString<B>::ReverseFind broken for wchar
Next Topic:	AMap's FindAdd, FindPut API changed

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sat Jul 27 03:19:55 CEST 2024

Total time taken to generate the page: 0.02692 seconds