Home » Developing U++ » UppHub » HtmlTools package for U++ (A libtidy wrapper, providing html5 parser, sanitizer and prettifer tools)
HtmlTools package for U++ [message #59956] |
Mon, 19 June 2023 01:22 |
Oblivion
Messages: 1093 Registered: August 2007
|
Senior Contributor |
|
|
Hi,
I am happy to announce that U++ is about to gain something U++ users have been long missing: Aa very powerful html parser/sanitizer/prettifer: HtmlTools package.
This package is basically a libtidy bindings/wrapper, bringing the power and performance of one of the oldest and widely used html library to U++.
You can find the initial version of the source and example code here.
DONE:
+ Implemented HtmlNode class. (A modification of Upp::XmlNode class)
+ Implemented TidyHtmlParser, TidyHtmlParser::Node classes for traversing the document tree.
+ Implemented ParseHtml and RepairHtml convenience functions.
+ Added a minimal code example, parsing the legacy example.com.
TODO:
- Enable U++'s memory managers in libtidy.
- Add U++ callbacks for libtidy's message queue.
- Refactor buffer allocation code.
- Add Topic++ documentation.
- Add more example code (both console & gui).
- Test the Windows build.
- Cosmetics.
The base example, downloading and parsing the example.com
#include <Core/Core.h>
#include <Core/SSL/SSL.h>
#include <HtmlTools/HtmlTools.h>
using namespace Upp;
void PrintHtml(const HtmlNode& node)
{
for(const HtmlNode& q : node) {
if(q.IsTag("title"))
Cout() << q.GatherText();
else
if(q.IsTag("p"))
Cout() << q.GatherText();
else
if(q.IsTag("a"))
Cout() << "For more information, see: " << q.Attr(0) << EOL;
PrintHtml(q);
}
}
CONSOLE_APP_MAIN
{
StdLogSetup(LOG_FILE);
HtmlNode n = ParseHtml(
HttpRequest("https://example.com/").Execute(),
{ { "wrap", 96 } }); // libtidy options...
PrintHtml(n);
}
Note that the package is still experimental.
Feedbacks are welcome.
Enjoy!
Best regards,
Oblivion
Github page: https://github.com/ismail-yilmaz
upp-components: https://github.com/ismail-yilmaz/upp-components
Bobcat the terminal emulator: https://github.com/ismail-yilmaz/Bobcat
[Updated on: Mon, 19 June 2023 01:24] Report message to a moderator
|
|
|
|
|
|
Goto Forum:
Current Time: Sat Apr 27 12:52:12 CEST 2024
Total time taken to generate the page: 0.05222 seconds
|