Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
UppHub
Status & Roadmap
FAQ
Authors & License
Forums
Funding U++
Search on this site











SourceForge.net Logo

SourceForge.net Logo

GitHub Logo

Discord Logo

CParser

 

class CParser

CParser is simple yet very useful lexical analyzer suitable for building descent parsers of languages with C-like syntax.

CParser works on '\0' terminated text in memory (not Stream), so to parse the file you have to load it into the memory first. Text has to exist during the time it is processed by CParser (no copy is made).

Many CParser methods can throw CParser::Error exception to indicate the failure to parse required symbol. When using CParser to build concrete parser, it is common to use this exception (preferably via ThrowError method) to indicate errors as well.

Routines handling with identifiers allow ascii letters, digits and characters '_' and '$'. '$' is not exactly in C standard, but is allowed by JavaSript, JSON, Java and most C/C++ compiler as extension.

CParser objects cannot be transfered (copied).

 

 

 

Constructor Detail

 

CParser(const char *ptr)

Constructs the CParser which can parse input buffer ptr.

 


 

CParser(const char *ptr, const char *fn, int line = 1)

Constructs the CParser, with additional information for the text. The additional info can be used when reporting error:

ptr

Pointer to the input text.

fn

The name of file (in fact, can be anything, value is just stored).

line

First line number.

 


 

CParser()

Constructs the CParser. Input text has to be assigned using the SetPos method.

 

 

Public Member List

 

void ThrowError(const char *s)

Throws CParser::Error with the error message s.

 

 


 

CParser& SkipSpaces(bool b = true)

Sets the mode of skipping spaces. If b is true, sets CParser to the mode where white-spaces are automatically skipped. First skip is performed when position in input text is assigned via constructor or SetPos, then the skip is performed after any symbol. If b is false, sets CParser to the mode where white-spaces are not automatically skipped, but have to be skipped by Spaces method. Default is true.

 


 

CParser& NoSkipSpaces()

Same as SkipSpaces(false).

 


 

CParser& UnicodeEscape(bool b = true)

Activates/deactivates whether CParser should recognize \u and \U unicode escape sequences in String. This option is active by default.

 


 

CParser& SkipComments(bool b = true)

Sets the behaviour of comments. If active, comments are treated as whitespaces (basically, they are ignored). This is default. Note that SkipComments has to be called before any parsing happens.

 


 

CParser& NoSkipComments()

Same as SkipComments(false).

 


 

CParser& NestComments(bool b = true)

If active, CParser recognizes nested comments (e.g "/* level1 /* level2 */ */").

 


 

CParser& NoNestComments()

Same as NestComments(false).

 


 

bool Spaces()

Skips white-spaces. Returns true if there were white-space to skip, false otherwise. Stores the position before advancing as "space pointer" to be retrieved by GetSpacePtr.

 


 

char PeekChar() const

Returns the current single character.

 


 

char GetChar()

Advances the position in the input text by one character and returns the character at the position before advancing.

 


 

bool IsChar(char cconst

Tests whether there is a specific character c at the current position.

 


 

bool IsChar2(char c1, char c2const

Tests whether there is a specific character pair (c1, c2) at the current position.

 


 

bool IsChar3(char c1, char c2, char c3const

Test for a specific character triplet (c1, c2, c3) at the current position.

 


 

bool Char(char c)

Tests for a single character c at the current position. If there is match, position is advanced and true is returned. If no match is found position remains unmodified and false is returned.

 


 

bool Char2(char c1, char c2)

Tests for a character pair (c1, c2) at the current position. If there is match, position is advanced by two characters and true is returned. If no match is found position remains unmodified and false is returned.

 


 

bool Char3(char c1, char c2, char c3)

Tests for a character triplet (c1, c2, c3) at the current position. If there is match, position is advanced by three characters and true is returned. If no match is found position remains unmodified and false is returned.

 


 

void PassChar(char c)

Calls Char(c). If it returns false, throws error.

 


 

void PassChar2(char c1, char c2)

Calls Char2(c1, c2). If it returns false, throws CParser::Error.

 


 

void PassChar3(char c1, char c2, char c3)

Calls Char3(c1, c2, c3). If it returns false, throws CParser::Error.

 


 

bool Id(const char *s)

Tests for given C-like identifier s. If there is match, advances position by strlen(s) characters. Returns true on match and false otherwise.

 


 

void PassId(const char *sthrow(Error)

Invokes the Id method with s as parameter. If it returns false, throws CParser::Error.

 


 

bool IsId() const

Tests whether there is any C-like identifier at the current position.

 


 

bool IsId(const char *sconst

Tests whether there is C-like identifier s at current position. Note that matched text actually does not need to follow C identifier rules, e.g. IsId("family-name") will work - the only thing that is tested that after the matched text, there is no continuation with character allowed inside C identifier. (E.g. CParser("family-names").IsId("family-name") is false while CParser("family-name;").IsId("family-name") is true).

 


 

String ReadId()

Reads C-like identifier from the current position. If there is none, a CParser::Error is thrown.

 


 

String ReadIdh()

Special variant of ReadId that allows hypens inside identifier.

 


 

String ReadIdt()

Special variant of ReadId that considers different non-alphanumeric characters to be the part of identifier as long as they form C++ normal or template based type.

 


 

bool IsInt() const

Test for integer at current position - there either must be digit, or '+' or '-' sign followed by any number of spaces and digit.

 


 

int Sgn()

If there are characters '-' or '+' at current posisition, skips them. If '-' was skipped, returns -1, otherwise 1.

 


 

int ReadInt()

Reads the integer from the current position. If IsInt is false, throws an CParser::Error.

 


 

int ReadInt(int min, int max)

Performs ReadInt and then checks the result to be in min <= result <= max. If not, throws a CParser::Error, otherwise returns it.

 


 

int64 ReadInt64()

Reads the 64-bit integer from the current position. If IsInt is false, throws an CParser::Error.

 


 

int64 ReadInt64(int64 min, int64 max)

Performs ReadInt64 and then checks the result to be in min <= result <= max. If not, throws a CParser::Error, otherwise returns it.

 


 

bool IsNumber() const

Tests for sign-less number at current position - there must be digit at current position.

 


 

bool IsNumber(int baseconst

Tests for sign-less number with given base - there must be digit or letter 'A' - 'Z' or 'a' - 'z', where range is limit by actual base (e.g. for base 12 letters 'a' 'A' 'b' 'B' are allowed).

 


 

uint32 ReadNumber(int base = 10)

Reads a number with the given numeric base. If IsNumber(base) is false, throws an CParser::Error.

 


 

uint64 ReadNumber64(int base = 10)

Reads 64-bit unsigned number with given numeric base.

 


 

bool IsDouble() const

Test for floating point number at current position - there either must be digit, or '+' or '-' sign followed by any number of spaces and digit.

 


 

bool IsDouble2() const

Similar to IsDouble, but also allows double number to start with decimal point, like '.21'.

 


 

double ReadDouble()

Reads a floating point number with C based lexical rules. As an exception to C lexical rules, ReadDouble also recognizes form starting with decimal point, like ".21".

 


 

double ReadDoubleNoE()

Speacial variant of ReadDouble that ignores exponential part of number. E.g. CParser("1.2em").ReadDoubleNoE() returns 1.2 (and does not throw error for invalid double).

 


 

bool IsString() const

Tests for C-like string literal at the current position. Same as IsChar('\"');

 


 

String ReadOneString(bool chkend = true)

Reads C-like string literal from current position (follow C lexical rules, including escape codes). Literals on different lines are not concatenated (unlike C). When chkend is false, ReadOneString is more permissive as it allows unterminated string literals - string is then also delimited by end of line or text.

 


 

String ReadString(bool chkend = true)

Reads C-like string literal from current position (follow C lexical rules, including escape codes). Literals on different lines are concatenated (as in C). When chkend is false, ReadOneString is more permissive as it allows unterminated string literals - string is then also delimited by end of line or text.

 


 

String ReadOneString(int delim, bool chkend = true)

Reads C-like string literal from current position (follow C lexical rules, including escape codes) with different delimiter delim than '\"'. Literals on different lines are not concatenated (unlike C). When chkend is false, ReadOneString is more permissive as it allows unterminated string literals - string is then also delimited by end of line or text.

 


 

String ReadString(int delim, bool chkend = true)

Reads C-like string literal from current position (follow C lexical rules, including escape codes). with different delimiter delim than '\"'. Literals on different lines are concatenated (as in C). When chkend is false, ReadOneString is more permissive as it allows unterminated string literals - string is then also delimited by end of line or text.

 


 

void Skip()

Skips a single symbol. Decimal numbers, identifiers and string literals are skipped as whole symbols, otherwise input position is advanced by 1 character.

 


 

void SkipTerm()

Same as Skip, legacy name.

 


 

const char *GetPtr() const

Returns a pointer to the current position.

 


 

const char *GetSpacePtr() const

Returns a pointer to the position of last whitespace before current position, or current position if there was none whitespace. This pointer is set at the start of Space call, Space is called after each token processed (unless SkipWhitespaces is false).

 


 

Pos GetPos() const

Gets the current position. It contains the pointer as well as the line number and the filename.

 


 

void SetPos(const CParser::Pos& p)

Sets the current position to p. p can be from a different CParser.

 


 

bool IsEof() const

Test for the end of input text.

 


 

operator bool() const

Returns true if end of file has not been reached, false otherwise.

 


 

int GetLine() const

Returns the current line number.

 


 

int GetColumn(int tabsize = 4const

Returns the current column, with given tabsize.

 


 

String GetFileName() const

Returns the actual filename.

 

 


 

static String LineInfoComment(const String& filename, int line = 1, int column = 1)

This function creates a special comment that when parsed by CParser, switches filename and line number. This is supposed to help in situations when parsed text is actually a result of e.g. include operations of some original files, to improve error reporting. Such comment is lexically treated as comment. Comment is created using LINEINFO_ESC characters begin/end delimiter (current value is '\2').

 


 

String GetLineInfoComment(int tabsize = 4const

Calls LineInfoComment(GetFileName(), GetLine(), GetColumn(tabsize)) - creates a comment to identify current file position in further processing.

 


 

void Set(const char *ptr, const char *fn, int line = 1)

Sets the new input string (with filename and line).

 


 

void Set(const char *ptr)

Sets the new input string.

 

 

CParser::Error

 

struct Error : public Exc

 

Type used as CParser exception. Contains single String with error description.

 

Derived from Exc

 

 

Constructor Detail

 

Exc()

Default constructor. Error message is empty.

 


 

Exc(const String& desc)

Construct an Error with desc as an error message.

 

 

 

 

CParser::Pos

 

struct Pos

 

Position in parsed text.

 

Constructor Detail

 

Pos(const char *ptr = NULL, int line = 1, String fn = Null)

Constrcuts a Pos based on pointer in input buffer, line number and anme of the file.

ptr

Pointer to the position in the input text

line

Line number.

fn

Filename.

 

Public Member List

 

const char *ptr

Pointer to the position in the input text

 


 

const char *wspc

Pointer to the position of last whitespace before current position, or current position if there was none whitespace.

 


 

const char *lineptr

Pointer to the beginning of the last line.

 


 

int line

Line number.

 


 

String fn

Filename.

 


 

int GetColumn(int tabsize = 4const

Returns the column, for given tabsize.

 

 

C-like string literal formatting

 

AsCString routines produce C-like literals (compatible with CParser) from character data:

 

String AsCString(const char *s, const char *end, int linemax = INT_MAX, const char *linepfx = NULL, dword flags = 0)

Creates C-like literal.

s

Pointer to characters.

end

End of characters array ('\0' characters are allowed inside data).

linemax

Maximal length of line. If this is exceeded, ending "\"\n" and linepfx is inserted and literal continues on the new line.

linepfx

Pointer to zero-terminated text to be inserted at the beginning of the line when the line length is exceeded.

flags

a combination of flags:

    ASCSTRING_SMART    breaks string into lines when too long

    ASCSTRING_OCTALHI    escapes characters >128

    ASCSTRING_JSON    uses JSON notation for escapes

    (\u0001 instead of \001)

 

Return value

C-like literal.

 


 

String AsCString(const char *s, int linemax = INT_MAX, const char *linepfx = NULL, dword flags = 0)

Creates C-like literal from zero terminated character data. Same as AsCString(ss + strlen(s), linemaxlinepfxflags).

 


 

String AsCString(const String& s, int linemax = INT_MAX, const char *linepfx = NULL, dword flags = 0)

Creates C-like literal from String. String can contain zero characters. Same as AsCString(s.Begin(), s.End(), linemaxlinepfxflags).

 

 

Do you want to contribute?