Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » problem when parse html with XMLParser
problem when parse html with XMLParser [message #48805] Fri, 22 September 2017 10:40 Go to next message
akebee is currently offline  akebee
Messages: 88
Registered: August 2011
Location: China
Member
	xml =
		"<html>"
		"<head><meta content=\"always\" name=\"referrer\"></head>"
		"<TITLE>A Midsummer Night's Dream</TITLE>"
		"</html>";
		
	XmlParser p(xml);
	while(!p.IsTag())
		p.Skip();
	p.PassTag("html");
	while(!p.End())
		if(p.Tag("head"))
		{
			while(!p.End()) {
				if(p.TagE("meta")) {
					//  crash here because "<meta" dones`t have "/>"
				}
				else
					p.Skip();
			}			
		}
	
		if(p.Tag("TITLE")) {
			String TITLE = p.ReadText();
			LOG(TITLE);
			p.PassEnd();
		}
		else
			p.Skip();	


the "<meta" in "<head>" dones`t have "/>" so crashed
is there any way to skip "<head>"?
Re: problem when parse html with XMLParser [message #48806 is a reply to message #48805] Sat, 23 September 2017 09:35 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11991
Registered: November 2005
Ultimate Member
First of all, it is not a crash, but exception:

CONSOLE_APP_MAIN
{
	String xml =
		"<html>"
		"<head><meta content=\"always\" name=\"referrer\"></head>"
		"<TITLE>A Midsummer Night's Dream</TITLE>"
		"</html>";
		
	XmlParser p(xml);
	try {
		while(!p.IsTag())
			p.Skip();
		p.PassTag("html");
		while(!p.End())
			if(p.Tag("head"))
			{
				while(!p.End()) {
					if(p.TagE("meta")) {
						// *exception* here because "<meta" dones`t have "/>"
					}
					else
						p.Skip();
				}
			}
		
			if(p.Tag("TITLE")) {
				String TITLE = p.ReadText();
				LOG(TITLE);
				p.PassEnd();
			}
			else
				p.Skip();
	}
	catch(XmlError) {
		LOG("ERROR!");
	}
}



...that covers case when you need to detect invalid XML, but not need to parse it.

U++ can even parse invalid XML, using Relaxed or Raw flags:


CONSOLE_APP_MAIN
{
	String xml =
		"<html>"
		"<head><meta content=\"always\" name=\"referrer\"></head>"
		"<TITLE>A Midsummer Night's Dream</TITLE>"
		"</html>";
		
	XmlParser p(xml);
	p.Relaxed();
	try {
		while(!p.IsTag())
			p.Skip();
		p.PassTag("html");
		while(!p.IsEof())
			if(p.Tag("head"))
			{
				while(!p.End("head")) {
					if(p.Tag("meta")) {
						DUMP(p["content"]);
					}
					else
						p.Skip();
				}
			}
			else
			if(p.Tag("TITLE")) {
				String TITLE = p.ReadText();
				DUMP(TITLE);
				p.PassEnd();
			}
			else
				p.Skip();
	}
	catch(XmlError e) {
		LOG("ERROR: " << e);
	}
	
	LOG("---- Done");
}


Re: problem when parse html with XMLParser [message #48813 is a reply to message #48806] Mon, 25 September 2017 03:19 Go to previous message
akebee is currently offline  akebee
Messages: 88
Registered: August 2011
Location: China
Member
tks very much! Twisted Evil
Previous Topic: Missing ArrayIndex and AIndex
Next Topic: about some class equivalent to BOOST
Goto Forum:
  


Current Time: Sun Aug 18 06:40:05 CEST 2019

Total time taken to generate the page: 0.01719 seconds