Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » Surprising behavior of CParser (Little warning about how one should use CParser with caution when parsing non-C-like strings...)
Surprising behavior of CParser [message #43042] Sat, 26 April 2014 13:18 Go to next message
dolik.rce is currently offline  dolik.rce
Messages: 1789
Registered: August 2008
Location: Czech Republic
Ultimate Contributor

Hi everyone,

I just want to share a bit of knowledge about CParser I just learned the hard way Wink This is not a rant, rather a cautionary tale:

There is a method SkipTerm, which does exactly that, skips one term. When in Spaces(true) mode, which is CParsers default, it also skips any whitespace after the string. So far it sounds reasonable a logical...

The surprising part to me was, that comments (both /* */ and //) are considered whitespace. Well, they fit the definition well. It kind of makes CParse unusable for many other languages, but I guess it can be explained by the 'C' in CParser Smile

I hit this problem in Ini parser, which is part of U++ (in one particular part of it, that I contributed myself - so I'll also have to fix it Wink ). Ini file is definitely a C-like file and should not be treated like one. I forgot about this, and perhaps Mirek did as well, when he applied my patch that replaces environment variables with their values.

For illustration, here is a very simplified example:
	CParser p("http://example.com");
	while(!p.IsEof()) {
		if(p.IsId())
			Cout() << p.ReadId() << "\n";
		else
			p.SkipTerm();
	}
My expectation was that code like this would print all ids in the string, that is "http". "domain" and "com". But in reality it prints only "http", because everything after "//" is discarded as a comment when SkipTerm() is called to skip ":".

I'm aware that this would be rather hard to fix in backward compatible way. Perhaps adding CParser::Comments(bool enable=true) method that would turn this behavior off only when required would be good idea. The main reason I write this post is to warn the rest of U++ user: It is dangerous to treat CParser as generic parser applicable to any text file. It is sure possible to parse almost anything with it, but one has to be really careful.

Hope this helps anyone Smile

Honza
Re: Surprising behavior of CParser [message #43046 is a reply to message #43042] Sat, 26 April 2014 20:05 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
dolik.rce wrote on Sat, 26 April 2014 13:18

Perhaps adding CParser::Comments(bool enable=true) method that would turn this behavior off only when required would be good idea.


Added as SkipComments/NoSkipComments.

Mirek
Re: Surprising behavior of CParser [message #43048 is a reply to message #43046] Sun, 27 April 2014 00:07 Go to previous message
dolik.rce is currently offline  dolik.rce
Messages: 1789
Registered: August 2008
Location: Czech Republic
Ultimate Contributor

mirek wrote on Sat, 26 April 2014 20:05
dolik.rce wrote on Sat, 26 April 2014 13:18

Perhaps adding CParser::Comments(bool enable=true) method that would turn this behavior off only when required would be good idea.


Added as SkipComments/NoSkipComments.

Mirek

Great, thanks!

Honza
Previous Topic: CParser do not check for invalid strings that span lines
Next Topic: [FIXED] String::Replace(empty string,) => Out of memory!
Goto Forum:
  


Current Time: Sun Apr 14 00:56:09 CEST 2024

Total time taken to generate the page: 1.86810 seconds