Re: [xml] Migration from MS XML and minimal build questions...



On Sun, Jun 25, 2006 at 10:46:18AM +0100, David Kelvin wrote:
On 24/06/06, Daniel Veillard <veillard redhat com> wrote:



Okay. To me your approach sounds exactly like this:
  "I don't want to invest learning libxml2 API, it's just a quick hack
   until I get what I want, but please give me the code which will do
this"


Daniel,

Sorry you feel that - I don't think it is quite true.  I am relatively new

  okay, maybe I misinterpreted your first mail.

[...]
I have looked at the libxml2 API (honest)
but it is rather like reading a telephone directory - it has all the
information you will ever need but doesn't tell you how to make a call!  [As
an aside - it would be nice to have the whole documentation in a PDF file!
Easier to print, especially on A4.]

  Hum, I would never try to print it, to me it makes thing way slower as you
can't search for informations easilly.

I have been in the IT business more years than I care to admit (at least 30)
and I learn much more from seeing working examples.  Even skeleton examples
(giving the idea but not actually all the coding) help me.  The API

  understood, that's why I tried to provide C examples online and have
them fully documented and linked to the documentation.

documentation comes in when you already "know" what you want/need and just
need the details to code it.  Well this is my way of working - sorry I am
probably too old to change and if it isn't your way.

  copying and modifying is easier for everybody. Sometimes you just can't
try to provide examples for everything. Sometimes you prefer not to, I honnestly
prefer to see people use the xmlReader (based on Microsoft reader API actually)
than SAX.

That code exists, in C form, basically when running
 xmllint --sax --schemas foo.xsd foo.xml
it will do exactly what I guess you are asking below. Of course xmllint
has
far more code than what you ask for but it's there


Thank you for this.

As to the features that you mention as "default behaviour", I couldn't find
this (the documentation of the API is very large and I could have missed
this).  Can you tell me where is it documented please?

  It's the default behaviour of xmllint, non validating, not DTD fetching which
are optional from the XML-1.0 spec point of view, it's also the less risky
behaviour. Everything except XML namespace support is optional in the parser
(and that last one is flexible, libxml2 still parse documents which
don't conform to Namespace in XML Recommendation, it just signal that as an
errro but not a fatal error). 
  Default behaviour is basically driven from the XML specs, only what is
mandatory is swicthed on by default.


If anyone has an example of using SAX2 to parse an XML file with the
above
features/properties - I would be very very grateful to look at it, i.e.
a. validate the XML file with an external XSD file (that I specify at
run
time)

it exists in xmllint.c as part of the distribution but you will have
to pick the right code
around line 3318 there is code to generate a parsed schemas
around line 1614 there is a testSAX function that you should look at too


Thank you - I will look.


I am sure that my current "startDocument"/"endDocument",
"startElement"/"endElement"  and "characters" routines can be quickly
converted from MSXML to libxml2.

not sure SAX/SAX2 is not defined for C everybody implemented it his own
way


The logic will be the same (I hope) just the calling parameters/structure
and return information.

  may differs, yes.

Hopefully, the error and warning routings
will also convert OK.

unlikely either


Again, hopefully the logic will be same.

  Since I don't know the logic of MSXML I can't tell. The main factor is
conformance to XML-1.0 spec sections on error handling.

many examples that newcomers like me can use, although the API
documentation
is very good and detailed - it is just putting it together that is
unclear.
Some examples haven't been updated since SAX1.

On purpose. I think SAX is a very bad API for beginners, so I carefully
avoid sending them in that direction.


I actually disagree with you in the case where one (me) needs to process a
record  at a time (my xml file is an export of a type of database, each
record is written as an element with each field as a separate embedded
element) rather than load the whole thing into a DOM memory just to import
to a new database stucture (if I understand the difference between SAX &
DOM).

  The reader interface does not need to load the full dom, it's a streaming
interface present in libxml2 and Microsoft .NET
   Two clicks away from the main page:
     http://xmlsoft.org/ 
       Developper menu ->
     http://xmlsoft.org/docs.html
       The reader interface
     http://xmlsoft.org/xmlreader.html

  it's slightly slower than SAX2 but the programming model is way cleaner.
And it may be familiar to people coding XML with recent Microsoft APIs.

Thank you for your time in responding and on pointers/details.  I will
follow up and I will read/learn the libxml API (well not all of it!).

   I know the huge API size is a problem. But I think it's the result of 
 trying to fullfill the requests of thousands of developper plus some of my
 initial mistakes. It is simply impossible to write down example for each
 and every kind of use case, well unless I get 10 man year for this, I certainly
 don't, too bad. I take contributions and examples to be added back.

PS. Yes - if I get it working I will try and reduce the library.  As you
know (certainly by now!), I do only need a minimal set of the large
functionality of libxml2.  From what I have seen of the changes to libxml2,
they are mostly in new functionality rather than the support of parsing and
validation.

  XSD validation is horribly difficult and only start to be finalized,
it accound for a large part of the changes in recent releases.

It would not be necessary to keep fully up to date.  Anyway, as
long as there are no changes to the default behaviour for parsing/validation
or their API, a new full version of libxml2 could be used "as is", as/when
it is released even if I haven't produced a minimal version.

  Let's say that by selecting SAX2 you picked a hard API, and by selecting
XSD you selected a very hard specficiation to implement (you being whoever
did the initial design), if you want fun, print XSD Schemas part 1 and read it,
then we can argue about the limits and obfuscation of libxml2 documentation ;-)

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]