Re: [xml] How can I parse an XML file whose filesystem path is a Unicode string?



Hi Liam,

 

I misdiagnosed the problem. The problem actually seems to be that the XML file I am parsing has a file entity whose path contains a Unicode character that needs to be escaped.

 

Here is the XML I am trying to parse:

 

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "W:/matlab/sys/namespace/docbook/v4/dtd/docbookx.dtd" [

<!ENTITY sect-002  SYSTEM "./uc£_html_files/image-000-chapter.xfrag">

]>

<book lang="en">

<?dbhtml filename="uc£.html"?>

<bookinfo><title></title><subtitle></subtitle><pubdate>31-Jul-2022 11:08:41</pubdate></bookinfo>&sect-002;</book>

 

Here is the error returned by the parser.

 

"Entity 'sect-002' failed to parse\n"

 

The parser escapes high-order characters in the URL for the main XML file but apparently does not do the same for file entities declared in the DTD.

 

I am currently trying to convert a Xerces-c/Xalan-c application to libxml/xslt. This is because Xalan-c is unable to execute the Docbook FO stylesheet. My Xerces-c implementation uses a custom entity resolver to resolve file entities. I might need a custom entity resolver to fix the problem with the libxml2 implementation. However, libxml2 does not seem to support custom entity resolvers. At lease, I have not been able to find this feature in the doc or the libxml2 code base on GitHub.

 

I would appreciate any help you can give to finding a solution.,

 

Regards,

 

Paul

 

 

From: Liam R E Quin <liam holoweb net>
Sent: Saturday, July 30, 2022 4:02 PM
To: Paul Kinnucan <paulk mathworks com>; xml gnome org
Subject: Re: [xml] How can I parse an XML file whose filesystem path is a Unicode string?

 

On Sat, 2022-07-30 at 17:15 +0000, Paul Kinnucan via xml wrote:
> Hi,
>
> I need to parse XML files whose paths may contain Unicode characters,
> for example,
>
> W:\jtbug\uc£\mydoc£.xml
>
> What is the best way to do this with libxml2?

Sounds like you are using Microsoft Windows and are going to use the C
API? How far have you got? What problems are you having exactly? What
errors do you get?

--
Liam Quin, https://www.delightfulcomputing.com/ https://www.paligo.net/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Antique illustrations, stock images, text:  http://www.fromoldbooks.org



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]