[xml] xmlCharEncInFunc usage -or- stdin parsing?



Newbie question - 

I have to pick up stdin of an unknown encoding and turn it into a
readable message.   The message is being sent raw (no dtd or schema, 
no protocols, no file boundaries ) across the network to a tcp port.
Using tcpsvd to provide the tcp interface now, but tempted
to write my own. 

The message IS firmly defined and structured with proper start and end
tags which are unique.   

The incoming encoding is most likely to be UTF16 but can be any
supported encoding. 

My choice was to set up a buffer on a ramdisk which I would transcode
into, basically grabbing 6 characters at a time from stdin and checking
if I've got the end-tag (which is 14 chars long).  

I cannot let it spin against stdin, it has to block waiting.  When I see
that my six characters are part of the end-tag I'll quit, dummy the
remainder of the end-tag to the file, close and re-open to parse, then
start trying to read stdin again and block. 

The problem is that the incoming chars can be any encoding at all and
cannot be recognized trivially using simple C. 

I attempted to set up the encoding using 

 thisDocEncoding = xmlDetectCharEncoding ( inputBuff, (SBUF_SIZE);
  encodingPtr = xmlGetCharEncodingHandler(thisDocEncoding);   


and then using 

result= xmlCharEncInFunc(encodingPtr, &mybuff, &inputbuff);

but this works only the first time it is called ?  All subsequent
attempts to use it fail (result = 0 and inconsistent result).

I zero'ed out the buffers between each use without changing this
result.   Possible problem is that the xmlbuffer structs require that
the buffer content be UTF8 according to the documentation. 

But...

xmlCharEncodingInputFunc(myBuffer,&outCount,inputBuff,(SBUF_SIZE));

requires I rewrite an encoder?  

I know that this happens automagically in the input normally, and I
don't have to worry about it IFF I have the input set up as a file.  I
have found however, that using it against stdin finds it not blocking
in IO and it doesn't know how/when to stop parsing.    (No way to tell
it to quit short of program exit).   I can't drop the program cause it
has to get the NEXT message which might be related to the first... and
so on.           

How can I use the existing tools to do this (supposedly simple) 
task?    Is there a simpler way of dealing with stdin?  Possibly
spawning a thread to eat each message?  

Also, a niggle.  There are a whole raft of xmlStr* functions which are
documented without mentioning whether or not there is a requirement for
a terminating null.  I am pretty sure there IS, but I don't have any way
to know without trying.  Maybe needs to be mentioned somehow in the
docs?   Usually Str means terminating null.  Is this true? Altogether
however, this is a LOT better than the xerces c++ bugfest.   Thanks for
making the effort. 

respectfully 
BJ






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]