Re: [xml] Need example / help for using SAX parsing on IO stream (socket)

From: Daniel Veillard <veillard redhat com>
To: Rich Salz <rsalz datapower com>
Cc: "xml gnome org" <xml gnome org>
Subject: Re: [xml] Need example / help for using SAX parsing on IO stream (socket)
Date: Thu, 14 Jul 2005 04:51:16 -0400

On Wed, Jul 13, 2005 at 10:36:01PM -0400, Rich Salz wrote:

As I said repeatedly stacking multiple
XML document within a single stream is a design mistake error.


I don't disagree with you.  But have you looked at the Jabber protocol? :(


  Very different, it is a single document, so you don't have to have
extra markers. On the other hand it doesn't play very well with TCP/IP
in the sense that you can't pipeline, i.e. you need all the data up
to the very last byte pushed on the other side before being able to
process and generate an answer, if you have fragmentation you may
end up hitting timeouts all the way between the sender and receiver 
on all intermediaries in the TCP connection (but it's mostly due to
the nature of IM traffic pattern anyway, so not a problem in a Jabber
context). At least it doesn't require opening/closing a new socket 
(with slow start and congestion effects) for every every exchange like
HTTP-1.0 was doing :-), but you also need to keep that socket open 
even if the client becomes silent for a long time (and there HTTP-1.1
wins overall).
  Jabber also assume that an XML parser which is pushed N bytes of
data will internally always forces to process all that data immediately,
you can't wait for more data, and that disable some potential optimization
at least in libxml2.
  On the other hand Jabber don't forces you to reinitialize a parser
state for all chunk of data processed, which is not a big deal if 
your documents are large but can be a real performance hit if you are
just exchanging IM like message (libxml2 2.6 has those xmlCtxtReadxxx
APIs to be able to reuse an existing parser as much as possible), you
have a trade-off between amount of data kept per connection and amount
of work needed to process one chunk of data, the Jabber approach is
that you keep parser context for all existing clients but minimizing the
CPU usage to process one of them.

  If the protocol can easilly embbed markers between documents within
the stream and the documents are not tiny then it's a reasonnable design
but if the document are tiny, you have no markers, or detecting markers
need a preparse layer then a Jabber like single document approach sounds
better to me. But there is a lot of trade-offs involved in picking one
solution or another, it depends a lot on the kind of traffic, number of
clients and the real constraints. Sometime stacking but with clear markers
on the stream is the best solution, but pushing concatenated documents
is an horror.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

References:
- Re: [xml] Need example / help for using SAX parsing on IO stream (socket)
  - From: Daniel Veillard
- Re: [xml] Need example / help for using SAX parsing on IO stream (socket)
  - From: Rich Salz

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]