Re: [xml] "Proper" way to use XML (not library specific)



John Dennis wrote:
On Wed, 2007-02-14 at 11:58 -0500, Will Sappington wrote:

The company I work for is standardizing on XML as the means for
representing various types of data, our configuration files being one
of them.  We currently use what we call an application profile, just a
hierarchical structure of name/value pairs, organized in an
application/section/item hierarchy.  The functional interface to the
profile is 1) open(), which opens the file and loads it into memory,
2) execute one or more getItem()âs to pull out configuration items
specified by application-name/section-name/item-name, and 3) close()
the profile.


Iâm new to XML, but based on a recommendation and my own analysis
after the fact, XPath seemed a reasonable way to replicate this
functionality in XML because it allows you to directly access specific
elements.  Iâve had some trouble implementing this, mostly due to a
lack of understanding of the library that was chosen (Xalan/Xerces).
In response to my troubles, my manager is saying that the problem is
with how Iâm trying to use XML, not with the library.  He says XPath
is not appropriate for this, that I should âmarshalâ the entire XML
file into a different form, toss the DOM, and operate on the
transformed data.  Iâll put his email to me, edited for brevity,
below.  But my question is, if XPath isnât appropriate for this, then
what is it appropriate for?  From the perspective of a configuration
utility, it certainly seems to be reasonable, if not obvious for the
interface to be such that the user can read specific configuration
items from the file in whatever order is desired, and XPath appears to
be designed to do specifically that with XML data.  So why would it
not be an appropriate tool to use for migrating our existing
name/value pairs (.ini files) to XML?


Bottom line, you can do it either way. Irrespective of whether you
marshal the data into your own data structure or reference it directly
in the DOM tree you still will as a first step need to locate the data
in the XML file, XPath is a very convenient way to do this. But it does
require you know what to query a priori, a lot of times that's not the
case (see comment about SAX below)

As to whether the queries in your API reference the DOM directly via
XPath, or a private data structure you built from the DOM is an
implementation choice dictated by your particular circumstances.

However, do consider the option of using SAX, not DOM for your purpose.
SAX is well suited to marshaling config file entries. The idea is your
SAX callbacks build the marshaled data structure as the file is parsed.
There is a tremendous advantage to this because the marshaled data
"builds itself".

It's not unusual to discover a marshaled data structure is easier to
work with once built, especially if any of the data needs to be
normalized, validated, or cross referenced in any manner.

Hi Will,

What he said. :-)

You can expose the same top level API in either case. However, if you do marshal the data into classes and use STL (I'm thinking maps or multimaps here), then you get a lot of pretty direct support from the STL classes. Your data is also highly mutable, if you need that sort of thing, and would be easy to un-marshal back to XML if you need to write out the config. If I needed to do that, it would tip me to use classes because it's not very easy to modify the DOM. (I don't know about streaming APIs because I haven't used them.)

As in the original library selection discussion, it might come down to that "today we make cigars, but next year we're going to start making H-bombs" thing again. But then, if you used Xpath and wrapped it in a class with a good API and good encapsulation, then when it's time to start making nuclear devices you could always change the underlying implementation to use classes/STL and no one would be the wiser. On the third hand, though, maybe you should just make your boss happy and use the class approach. It's a good excuse to learn to use STL if you haven't done it before, and it might get you a better raise come next review time. There are just so many factors to consider here!

- Rush



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]