Re: [Vala] Manipulating HTML tag soup in Vala



Hello,

how about 2-stage processing? Loading HTML into WebKitGtk, dumping DOM (
https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebPage.html#webkit-web-page-get-dom-document)
which contains already parsed structure, sanitizing DOM and displaying
serialized version of modified DOM for the future use?

It should be more secure, too.

m.

2016-08-01 10:01 GMT+02:00 Michael Gratton <mike vee net>:


Hey all,

I'm looking for an HTML tag soup library for Geary, that can load tag soup
HTML (i.e. possibly malformed) from a stream, allow some manipulation of
it, and re-serialise it for display in WebKitGTK. Ideally, a pull-parser
API like libxml2's TextReader or StAX[0] would be great, so the whole
document does not need to be kept in memory as it is processed.

These are the ones I know about:

libxml2:
- Pros: Has a pull parser API, has a HTML4 tag soup parser, installed
everywhere
- Cons: Pull parser doesn't work with HTML parser without reading whole
document into memory, HTML parser out of date(?)

GXml:
- Pros: Nice Vala API, uses libxml2 under the hood
- Cons: Not a pull parser, loads whole document into memory, doesn't seem
to be packaged for any distros, doesn't use the libxml HTML parser(?)

Others:
- WebKitGTK+: Great tag soup parser, no pull API, doesn't allow
manipulating the markup before displaying it (which is the main reason I
need to parse the HTML beforehand)
- XML Bird: Nice Vala API, but not a pull parser or a HTML parser

So none of these seem to completely fit the bill. Are there any other
options out there that I have missed? Has anyone else had parse tag soup in
Vala?

Ta!
//Mike

[0] - <https://en.wikipedia.org/wiki/StAX>

--
⊨ Michael Gratton, Percept Wrangler.
⚙ <http://mjog.vee.net/>


_______________________________________________
vala-list mailing list
vala-list gnome org
https://mail.gnome.org/mailman/listinfo/vala-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]