Re: [xml] Parsing XML in embedded environment



On 06/11/2011 04:01 AM, Liam R E Quin wrote:
On Sat, 2011-06-11 at 01:02 +0200, David Kubicek wrote:
The problem is that by default, libxml knows only the basic 5 XML
entities.
Why is this a problem?
That is not a problem per se. It's correct, I'm just pointing it out to provide full background.

XML documents must either stick to those entities or define the ones
they want to use, so you should not predefine others.
I don't want to predefine others. I have an XML document (with correct DOCTYPE) and a DTD that contains all entitiy definitions. I just can't get libxml to load ("to know") this associated DTD before I start parsing the XML. Unfortunately, there isn't much actual documentation, it's just a list of prototypes.

I just can't find (or even think
how to add it by modifying the library) any way of supplying libxml with
the extra entities.
One way might be to override the entity resolver.
Yes, I looked at it some time ago, but it doesn't seem to work and doesn't have *any* documentation. Apparently, one needs to return "xmlParserInputPtr", which is only created by "xmlNewInputFromFile". There is no "xmlNewInputFromMem". Plus, "xmlNewInputFromFile" isn't documented. Nor could I find how to create a usable "xmlParserInputPtr" from a memory buffer.

So I jumped in the libxml source. I wrote my own "xmlNewInputFromMem" by mimicking "xmlNewInputFromFile". That is, internally calling something like this:

buf = xmlParserInputBufferCreateMem();
input = xmlNewInputStream();
input->buf = buf;
return input;

With proper setup and initialization, just like "xmlNewInputFromFile" does. Then, in main(), before calling "xmlReadMemory", I registered my own external entity loader like this:

xmlSetExternalEntityLoader(myLoader);

Function "myLoader" prints the passed URL & ID, so that I could see what libxml is trying to load. If it's loading the DTD specified in our XML's, etc. But after all this setup and extending libxml source ("xmlNewInputFromMem") and registering the handler - nothing happens. myLoader() is never called. The following xmlReadMemory() prints the same errors about undefined entities.

Why not just define the entities in the document?
We receive the XML/DTD from a third party. We cannot hack around this, the solution should be fast & clean.


Thank you for your quick reply. If you could help me with the external loader, that would be great.

--
Dave Lister




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]