[Vala] Manipulating HTML tag soup in Vala




Hey all,

I'm looking for an HTML tag soup library for Geary, that can load tag soup HTML (i.e. possibly malformed) from a stream, allow some manipulation of it, and re-serialise it for display in WebKitGTK. Ideally, a pull-parser API like libxml2's TextReader or StAX[0] would be great, so the whole document does not need to be kept in memory as it is processed.

These are the ones I know about:

libxml2:
- Pros: Has a pull parser API, has a HTML4 tag soup parser, installed everywhere - Cons: Pull parser doesn't work with HTML parser without reading whole document into memory, HTML parser out of date(?)

GXml:
- Pros: Nice Vala API, uses libxml2 under the hood
- Cons: Not a pull parser, loads whole document into memory, doesn't seem to be packaged for any distros, doesn't use the libxml HTML parser(?)

Others:
- WebKitGTK+: Great tag soup parser, no pull API, doesn't allow manipulating the markup before displaying it (which is the main reason I need to parse the HTML beforehand)
- XML Bird: Nice Vala API, but not a pull parser or a HTML parser

So none of these seem to completely fit the bill. Are there any other options out there that I have missed? Has anyone else had parse tag soup in Vala?

Ta!
//Mike

[0] - <https://en.wikipedia.org/wiki/StAX>

--
⊨ Michael Gratton, Percept Wrangler.
⚙ <http://mjog.vee.net/>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]