Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Developing U++ » UppHub » HtmlTools package for U++ (A libtidy wrapper, providing html5 parser, sanitizer and prettifer tools)
HtmlTools package for U++ [message #59956] Mon, 19 June 2023 01:22 Go to next message
Oblivion is currently offline  Oblivion
Messages: 1094
Registered: August 2007
Senior Contributor
Hi,

I am happy to announce that U++ is about to gain something U++ users have been long missing: Aa very powerful html parser/sanitizer/prettifer: HtmlTools package. Cool

This package is basically a libtidy bindings/wrapper, bringing the power and performance of one of the oldest and widely used html library to U++.

You can find the initial version of the source and example code here.

DONE:
+ Implemented HtmlNode class. (A modification of Upp::XmlNode class)
+ Implemented TidyHtmlParser, TidyHtmlParser::Node classes for traversing the document tree.
+ Implemented ParseHtml and RepairHtml convenience functions.
+ Added a minimal code example, parsing the legacy example.com.

TODO:
- Enable U++'s memory managers in libtidy.
- Add U++ callbacks for libtidy's message queue.
- Refactor buffer allocation code.
- Add Topic++ documentation.
- Add more example code (both console & gui).
- Test the Windows build.
- Cosmetics.




The base example, downloading and parsing the example.com

#include <Core/Core.h>
#include <Core/SSL/SSL.h>
#include <HtmlTools/HtmlTools.h>

using namespace Upp;

void PrintHtml(const HtmlNode& node)
{
	for(const HtmlNode& q : node) {
		if(q.IsTag("title"))
			Cout() << q.GatherText();
		else
		if(q.IsTag("p"))
			Cout() << q.GatherText();
		else
		if(q.IsTag("a"))
			Cout() << "For more information, see: " << q.Attr(0) << EOL;
		PrintHtml(q);
	}
}

CONSOLE_APP_MAIN
{
	StdLogSetup(LOG_FILE);
	HtmlNode n = ParseHtml(
		HttpRequest("https://example.com/").Execute(),
		{ { "wrap", 96 } }); // libtidy options...
	PrintHtml(n);
}




Note that the package is still experimental.

Feedbacks are welcome.

Enjoy!

Best regards,
Oblivion


[Updated on: Mon, 19 June 2023 01:24]

Report message to a moderator

Re: HtmlTools package for U++ [message #59960 is a reply to message #59956] Tue, 20 June 2023 23:48 Go to previous messageGo to next message
Oblivion is currently offline  Oblivion
Messages: 1094
Registered: August 2007
Senior Contributor
Hi,

HtmlTools package (libtidy wrapper for++) is updated. Hopefully it will be available on UppHub very soon.

It now compiles on Windows too. The library is configured to be statically linked,

API docs are added.

Best regards,
Oblivion


Re: HtmlTools package for U++ [message #59965 is a reply to message #59956] Tue, 27 June 2023 12:43 Go to previous messageGo to next message
Oblivion is currently offline  Oblivion
Messages: 1094
Registered: August 2007
Senior Contributor
Hi,

HtmlTools, a libtidy wrapper for U++, is updated:

+ U++'s memory managers are enabled.
+ TidyHtmlParser::Node class gained ToHtmlNode() method. Allows any node to be converted to HtmlNode class.
+ Package is also uploaded to upp-components repo.



Best regards,
Oblivion


Re: HtmlTools package for U++ [message #59984 is a reply to message #59956] Sat, 01 July 2023 10:42 Go to previous message
Mountacir is currently offline  Mountacir
Messages: 49
Registered: November 2021
Member
Hi,

I tried to make something similar a couple of months ago, i was aiming for a web scraping plugin like Beautiful Soup. I managed to show amazon.com on XmlView Smile but my code was such a miss i never got the courage to publish it.

Thank you very much Oblivion for this package.
Previous Topic: SysInfo - which is the correct (latest) repository ?
Next Topic: EscPainter package, a painter extension for Esc scripting language
Goto Forum:
  


Current Time: Thu Jun 13 16:42:02 CEST 2024

Total time taken to generate the page: 0.02073 seconds