Re: [xml] DOM Base URI (xml:base, RFC 2396)



Hey all,
I'm causing trouble for Richard by asking for things like:
http://bugs.php.net/bug.php?id=44367

Basically, what happens in the following scenarios with the baseURI of
a document?

 1. A document is loaded from a URI (http://foo.com/)
 2. An xhtml document is loaded from a URI  (http://foo.com/), but has
a <base href="http://bar.com/"; />
 3. An xml document is loaded from a URI, but has an <Foo
xml:base="http://bar.com/"; />
 4. An xml document is loaded from a URI, which was redirected (GET
http://foo.com/ redirected to http://bar.com/)
 5. An xml document is loaded, and has an xml:base attribute - but
it's not on the root element (/Foo/bar[ xml:base])


From what I read of http://www.faqs.org/rfcs/rfc2396.html, section 5.1
& on, I think it should be:

1. http://foo.com/
2. http://foo.com/
2a. Unless the implementation understands xhtml / html - http://bar.com/
3. http://bar.com/
4. http://bar.com/
5. http://foo.com/


The current behavior for PHP (using libxml2 2.6.31) isn't that.


Additionally, there are a number of GRDDL (a W3C TR) tests which
explicitly expose these kinds of behaviour - and the expected test
results marry up to the behaviour outlined above.

See also: http://www.w3.org/TR/grddl-tests/#htmlbase1


On Thu, Mar 13, 2008 at 11:32 PM, Rob Richards <rrichards php net> wrote:
Hi Daniel,

 I'm taking this off the PHP bug system as if it were to be a bug (still
 say its not) it would end up being a libxml2 bug and need to be taken up
 there.

 Anyways,  back to the issue at hand.

 When you went through the points in xml:base, you forgot the piece:
 "The base URI of a document entity or an external entity is determined
 by RFC 2396 rules, namely, that the base URI is the URI used to
 retrieve the document entity or external entity.

 GRDDL might be attempting to address xml:base issues, however, any
 resolutions it comes up with pertains to GRDDL and any specs referencing
 GRDDL. DOM is generic and based on its specs, follows the XML Infoset,
 XML Base and RFC 2396 specs to determine base uri. Using document
 content to determine a base uri is dependant upon the media type. For
 instance, text/html media type (HTML) can use a BASE element to
 determine base uri. DOM strictly works with either XHTML or XML
 (excluding HTML here to be able to talk about this generally). The XML
 specs themselves, do not specify any such special way to determine  base
 uri, hence it resorts to the quote I mentioned above.

 Now, if you are working with GRDDL, the base uri of a document may
 indeed be dependant upon the document element. So, when writing
 applications to work with GRDDL, you must use the base uri property of
 the document element. This still does not change the fact that you are
 still are using DOM, so the base uri of the DOMDocument is dependant
 upon the specs DOM uses to determine base uri.

 After all this, I still could be completely wrong, though do not believe
 that so. If you would like to continue this discussion, I would suggest
 you CC the libxml2 dev list (xml gnome org) as that would provide much
 more input on the subject. You might also want to check with the Xerces
 devs as well (another parser I tend to use for comparison) as they are
 probably much more responsive to questions than Microsoft :)

 Rob




-- 
Looking for a new php job? See what you can do with
https://vx.valex.com.au/tests/season/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]