Re: [xml] xml:base missing on result from XInclude?



Salut Daniel,

Daniel Veillard <veillard redhat com> writes:

On Tue, Apr 22, 2014 at 10:11:46AM +0000, Susanne Oberhauser-Hirschoff wrote:

The xml:base is not just the directory, it also contains the file name,
right? 

  right but it is not needed, in that case all your files are in the
same directory, no need to add an xml:base it doesn't change any
further URI-Reference done from the included portion

Ah, now I understand where you are coming from: you were only concerned
about external URI references going out from the xi:included portion!


Now I'm doing docbook processing and I find xml:base extremely usefull
to tell the origin of each part of the xi:include processed document: go
to the closest parent with xml:base defined, voilá, that's where this
part originates :-)

When I shuffle the included parts into subdirectories, all is well.  But
when all fragments are in the same directory, the current xml:base fixup
kicks them out :-(


The whole XInclude test suite behaves like that, see below.

  http://www.w3.org/TR/xinclude/#base

the goal was really to make sure any further URI-Reference would not
be broken.

I'm confused.  Whose goal?

So it _should_ look like this, shouldn't it?  This is what I get with
the attached patch to libxml:

### correct output ################################################
xmllint --xinclude 1.xml 
<?xml version="1.0"?>
<top xmlns:xi="http://www.w3.org/2001/XInclude";>
  <elem1 xmlns:xi="http://www.w3.org/2001/XInclude"; xml:base="2.xml">
  <elem2 xml:base="3.xml">
  <a fileref="x.svg"/>
</elem2>
</elem1>
</top>
###################################################################

  and with or without the xml:base the fileref URI reference will work
correctly. on the other hand having tons of xml:base getting in the
final document is more a nuisance than a benefit, especially if you
have a lot of top level XIncluded element


The lxml.etree._Element sourceline and base go directly to libxml2
xml:base.  No xml:base fixup means useless sourceline: As the base
remains unset per the current logic, the base-sourceline combo is simply
wrong.

I see no other way than xml:base to track which file some part of the
document comes from.

The XInclude test suite agrees, when run with the attached script, like
this.

###################################################################
cvs -d:pserver:anonymous dev w3 org:/sources/public \
   co  2001/XInclude-Test-Suite  XInclude-Test-Suite

cd XInclude-Test-Suite

python3 PATH-TO/run-tests-with-lxml.py
###################################################################

This gets about 15 less failures when run with the patch below, and
afaict from a review with/without patch, there is no additional ones.

So it should be an improvement :)


  Not completely sure TBH, your test suite output will look nice, the
users document not so ... which one is most important ?

It's not about looking nice.  If I was about looking nice, I'd try to
get rid of the duplicate xmlns:xinclude namespace declarations which
remain in the document after xi:include processing :-)


The problem is that with the current implementation, xml:base in libxml2
is restricted to exactly one use case, relative references to external
resources.  However for identifying where which part of the document
originates, a correct filename in xml:base is key.



Also I'm not sure about the amount of additional information.  The
xml:base attribute is only added to the root that replaces the
xi:include.  It is *not* added to any other internal node.

In anything beyond simple test cases with tiny document fragments, the
signal/noise is actually improving as every xml:base that's added is
relevant metadata.


I think that's also why the test suite has xml:base in _all_ places
where xi:include was processed.


On the list I saw you had your fight getting xml:base processing in at
all.  It made changes to DTDs necessary.  However when the files are in
subdirectories (or super directories), the xml:base will be added,
anyhow, so the DTDs / schemas have to deal with xml:base already.


What's the use cases that do thousands of xi:includes of tiny xml
fragments, rendering the current tuning necessary?

If that's real I could redo a patch with an option.

Though I'd prefer not to :)

Thx,


S.

-- 
Susanne Oberhauser                     SUSE LINUX Products GmbH
+49-911-74053-574                      Maxfeldstraße 5
Processes and Infrastructure           90409 Nürnberg
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]