Re: [xml] Undefined character entities and libxml



On 12/17/05, Daniel Veillard <veillard redhat com> wrote:
On Fri, Dec 16, 2005 at 07:55:07PM -0500, Jon Smirl wrote:
I have things working now.

The help/man for --loaddtd wasn't enough for me to figure out what it did.
        --loaddtd : fetch external DTD

Second the ruby wrapper for loaddtd was broken,
XML::Parser::default_load_external_dtd. I've sent a patch to the
maintainer. This was was was causing me a lot of problems, I set the
option but it was not getting set because of the breakage in the C
wrapper.

Needing to install the DTD and use --loaddtd to fix the undefined
entity error was not obvious. It might make a good entry for your
libxml FAQ.  google didn't turn up a ready answer either.

  Hum, how would you phrase this ?

Put the error message into the FAQ so that google will pick it up.
root.xml:43: parser error : Entity 'nbsp' not defined

I was searching for variations of  "libxml entity not defined" and
couldn't get a good hit.

Then add a paragraph about what an external subset is and why you need
to install a DTD in order to resolve the entities. Knowledge of
external subsets is probably not common. Most XML files don't use them
and most of the HTML parsers build the entities in. It's only XHTML
that commonly needs the external subset.

A warning indicating that a DTD is being fetched over the net instead
of from the catalog would probably be good too. w3.org is so busy that
sometimes the DTDs don't come back. This may be from web sites using
parsers and not knowing that the DTDs are being fetched. With the ruby
wrapper there is no indication that this is happening other than your
parse is slow.

My current problem is that my XHTML xpath queries aren't matching.
This seems to be because the namespace associated with the query isn't
being set into libxml correctly by the ruby wrapper when the namespace
is the default one. I'm still debugging it.

The xhtml 1.1 DTD is huge, fifty files. Is there some way to set
things up so that libxml can use a small DTD which only contains the
external subset in non-validating mode and then use the full one for
validating? I'd rather not parse 10,000 lines of DTD just to read a 20
line xhtml file.

  set an entity resolution handler (see the section on I/O in the doc)
and catch the request for XHTML1.1 then provide a reduced input.
XHTML-1.1 being a bit nebulous is probably one of the reasons it's
not very common.

I'll give this a try.

A lot of people using XHTML may not have the knowledge to set
something like this up. It would be nice if there were rpms for
xhtml1.0 and xhtml1.1 that set up the DTDs in the catalog for
validation and also set up a minimal external subset for performance.
You'd have to modify the catalog mechanism to return the full DTD or
minimal subset depending on which mode the parser was in.


Daniel

--
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



--
Jon Smirl
jonsmirl gmail com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]