Re: [PATCH 2/5] connectivity: Add libxml2 as a dependency

Hi Dan.

2013/2/11 Dan Williams <dcbw redhat com>
On Mon, 2013-02-11 at 15:30 -0200, Jonh Wendell wrote:
> In these patches I want to fix the 511-http-status. As it's something
> new, of course most hotspots don't use that (including my employer).
> Almost all of them rely on 30X Moved with help of the Wispr
> 'pseudo-protocol'. It's on my TODO list to work on those scenarios.
> That would touch only code in NMConnectivity object.
> Indeed, for that cases, we would use xml parsing as wispr is xml.

Thanks for the clarification.  In any case, would a regex be possible
here instead of the XML for now?  It might be less code and would
certainly be less error-prone when checking broken HTML which is quite
I'm not sure if parsing a HTML with regex is less error-prone. There are lots of traps in mal-formed HTML, or even if legitimate HTML structs like

<img title="displays >" src="">
So, I'd prefer to rely on libxml which is supposed to be more smart than a simple regex.
In the patch, I'm using the flags 'HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING' so that libxml doesn't complain about broken HTML. In fact, I've tested here some broken HTML and it was parsed successfully by libxml.

Plus, as I stated earlier, libxml will be required to handle Wispr responses (and hotspot 2.0), which are legitimate XML trees. So, why not just add it as a dependency right now?

> 2013/2/11 Dan Williams <dcbw redhat com>
>         On Mon, 2013-02-11 at 17:10 +0100, Bastien Nocera wrote:
>         > On Mon, 2013-02-11 at 10:06 -0600, Dan Williams wrote:
>         > > On Mon, 2013-02-11 at 12:09 -0200, Jonh Wendell wrote:
>         > > > From: Jonh Wendell <jonh wendell oiwifi com br>
>         > > >
>         > > > libsoup already depends on libxml2 but we need to
>         explicitly link
>         > > > to it.
>         > >
>         > > At least we already theoretically required it; though is
>         it possible to
>         > > use GMarkup here instead of libxml2?  GMarkup would be
>         somewhat simpler,
>         > > though it's only a subset.
>         >
>         > Given how it's used, there's probably little reason this
>         couldn't be a
>         > regexp. Both GMarkup and libxml2 would choke on broken,
>         slightly broken,
>         > and very very broken HTML files.
>         Yeah, and I've heard that for example, some hotspots literally
>         just
>         append raw XML to the end of the HTTP request outside the
>         XML.  I think
>         we need to be somewhat more robust here and XML parsing may
>         not be the
>         way to get there?  Also, we may need to add special cases for
>         various
>         hotspots, which might require regex and not just XML parsing.
>         Dan
> --
> Jonh Wendell

Jonh Wendell

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]