U++ forum: Welcome to the forum

Search on this site

Search in forums

Home » Community » Coffee corner » Substring search algorithm

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Substring search algorithm [message #35531]

Sun, 26 February 2012 10:55

chickenk
Messages: 171
Registered: May 2007
Location: Grenoble, France

Experienced Member

Hello,

I have a very bad knowledge of algorithms in general (I'm not a math guy...) but I found this article and I thought it could be interesting sharing it, results seem to be good (I know, I know, benchmarks have no value when they are made by the algorithm creator... anyway...) : http://volnitsky.com/project/str_search/index.html

I don't know if there are any specific substring searching algorithm in U++, but that may be interesting to compare and see who can perform better. The reference implementation is in C++, and there are fallbacks to std::search() in some places, which could be replaced by fallbacks to U++ own search implementation. I am really curious about the results from such a combination...

Hope there's something interesting to take from this algo.

Cheers
Lionel

Report message to a moderator

Re: Substring search algorithm [message #35532 is a reply to message #35531]

Sun, 26 February 2012 13:14

mirek
Messages: 14265
Registered: November 2005

Ultimate Member

Quite interesting idea. Have to admit took me hour+ to understand the algorithm; perhaps part of initial misunderstanding was that if I understand it well, the substring length is limited to 64KB (or H_size) (code suggest something like offset_t maximum).

Also, even the the original code suggests that it only works fine for searched strings > 20kb, because initialization costs are pretty high - that is a bit hight for Upp::String.

OTOH, it might be interesting to try this with VectorMap instead of that hash thing in the code. Very likely, it would be quite faster and init costs would be much smaller.

Mirek

Report message to a moderator

Re: Substring search algorithm [message #35535 is a reply to message #35532]

Mon, 27 February 2012 10:55

mr_ped
Messages: 826
Registered: November 2005
Location: Czech Republic - Praha

Experienced Contributor

"Boyer-Moore-Horspool" and family of those are suitable for U++ implementation (I did it years ago in Pascal at uni), but I wonder what's the real speed up benefit in real world app, because nowadays the CPU + L1 cache operates light years faster than L2+ cache/RAM, so while BMH will save you some compares, it has to fetch the full text anyway, and I would expect a properly implemented naive byte compare will easily do it's work inside the "time window" of memory fetch of next data.

But I may try to write BMH for U++ and toy around with that for a while to see some real numbers.

[Updated on: Mon, 27 February 2012 10:55]

Report message to a moderator

Previous Topic:	Intersrv/interlnk
Next Topic:	Rendering

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Jul 13 23:22:22 CEST 2025

Total time taken to generate the page: 0.03917 seconds