Re: [xml] xmlParseFile;encode - newbie question



On Wed, Feb 23, 2005 at 01:01:04PM +0100, Baurzhan Ismagulov wrote:
On Wed, Feb 23, 2005 at 01:21:05PM +0200, Pieter Louw wrote:
That is my question, how do I convert the & to &?

or are there no function to do this in libxml?

As Daniel said, "Whatever generating this must be fixed".

If you have to fix this yourself, try something like sed 's/&/&/'
and parse the resulting file.

        That won't work.  Transforming it like that will break any correctly
specified entities.  The cleanest way I can think of to handle this
is to have a "forgiving" mode for libxml2 that treats bad entity references
as plain text.  If you really want to do it with sed instead, you need
a multi-step translation.  Something like this:

sed -E -e's/X/XX/g' \
        -e's/&(lt|gt|apos|quot);/X\1;/g' \
        -e's/&/\&/g' \
        -e's/&/\&/g' \
        -e's/([^X])X(lt|gt|apos|quot);/\1\&\2;/g' \
        -e's/^X(lt|gt|apos|quot);/\&\1;/g' \
        -e's/XX/X/g'

That'll turn this example:
" Xlt; < & &&& &lt foo & bar XX
into this:
" Xlt; < & &&& &lt foo & bar XX

Of course, this assumes you don't have any custom entities defined.  If
you do then hacking libxml to accept slightly malformed input will
probably be easier that trying to preprocess it.

eric



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]