Re: GJS : How can I use Regex ( parse HTML ) ?



On Mon, 24 Jun 2019 at 21:15, Tony Houghton <h realh co uk> wrote:

On Mon, 24 Jun 2019 at 18:56, makepost <makepost firemail cc> wrote:

Regex and MarkupParser from GLib won't work because they don't have
constructors compatible with GObject introspection, so only their helper
methods such as escape or match_simple are visible to Gjs.

Do you mean because they have non-standard ref and unref funcs? They should still be usable from gjs. They don't seem to have unref-func and ref-func annotations, but perhaps bindings can automatically infer that from the method names being unref and ref? If not, it's a bit shocking that nobody's added the annotations in all this time, but the worst that could happen is a memory leak.

GMarkupContext has ref/unref functions, and it even has a boxed GType, so the type system knows how to copy and free data. The problem is GMarkupParser, the structured type you use to define your own parser.

The issue is that GMarkup is a very, very C-oriented, low level API. It really doesn't work in a way that is safely bindable to other languages—from the use of function pointers in a struct, to the variadic arguments in the attribute collection, passing through the single scope for the vfunc data. At most you can pass around GMarkupContext instances, but implementing a GMarkupParser to go along with it is not possible.

Additionally, GMarkup is *emphatically* not a generic XML parser. GMarkup, as its documentation *clearly* states, is "intended to parse a simple markup format that's a subset of XML"; additionally, it "must not be used if you expect to parse untrusted input". This means you should only ever use it to parse some XML fragment you yourself have created, or a *very* well-defined subset of XML. In other words: it's fine for parsing a cache file, or some configuration file, but as soon as you put non-validated user input in the middle of it, you cannot have any expectation of things not blowing up.

The expectation is that anything that requires a proper XML parser would have its own library—like libxml2. Various languages have bindings for it. Sadly, it's not introspectable because it's not using GLib or GObject, so you'd have to write your own GObject wrapper around it, and then use the introspection data generated from that. There's a GXml library using libxml2 underneath, written in Vala, that exposes a C ABI and introspection data:

  https://gitlab.gnome.org/GNOME/gxml

Ciao,
 Emmanuele.
 
--


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]