My experience is that many surprising companies don't do correct XML. Walmart for example does not put white space between attributes:
attr1="value1"attr2="value2"
I have one company that insists on one line (to CR or LF) for the entire XML (hard to read and debug). Another won't take lines longer than 1024.
Once can spend all decade trying to Fedex to change, Walmart to change, all year changing libxml2, etc.
I went another way. I have a pre and post filter on my XML that makes it as I want it.
So when I get XML I run it through the pre-filter and change as needed (based on a libxml2 setup XML file!) and when I send it I run it through a post-filter and change as needed also based on libxml2 setup XML file.
One can be "right" and waste more time than it is worth ... I dare anyone to try and "fix" Fedex or Walmart who insists they are correct ... or dozens of other companies. It is easier to just fix it your self, and then process it.
This also has the advantage that the processing code is "clean" -- since your input and output will be standard (whatever you chose) ... only the pre/post filters may need work. Of course you need only filter the goofballs (99% of our stuff goes through as is).
Another advantage is the pre/post filters don't have to use libxml2 at all (mine don't) -- read through them "by hand" with whatever language you use and you can change tags, namespaces, anything you like in any direction. Just make sure the result is valid!
Just an idea but libxml2 cannot do everything for everyone -- especially when huge corporations do as they please. It is a tool kit, not a complete car ... if you want to rebuild the engine, get the tools out, use them, and build your engine ... don't expect the tools to do everything. Instead, just depend on them having all the tools which so far (as a user from way back when it was libxml.a) it has always worked and been reliable and fast.
Eric
On 1/28/2018 4:19 AM, Mikhail Goloborodko wrote:Hi All,
I will appreciate if somebody could help on how to normalize and canonicalize XML.
For example<?xml version="1.0" encoding="WINDOWS-1251"?><ed:N1 attr="4583001999" xmlns:ed="urn:ru:ed:v2.0"></ed:N>
I need to get
<n1:N1 xmlns:n1="urn:cbr-ru:ed:v2.0" attr="4583001999"></n1:N1>
And for
<?xml version="1.0" encoding="WINDOWS-1251"?><N1 attr="4583001999" xmlns="urn:ru:ed:v2.0"><N2 attr="value"></N2></N1>
I need
<n1:N1 xmlns:n1="urn:ru:ed:v2.0"> attr="4583001999"<n1:N2 attr="value"></n1:N2></n1:N1>
In other words I need to remove whitespaces and rewrite namespace prefixesI usestring src;xmlChar * canon;xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr, nullptr, XML_PARSE_NOBLANKS);int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, & canon);
It removes whitespaces, need help with namespace prefix rewrite.
Thank you in advance.
On Sun, Jan 28, 2018 at 12:41 AM, Mikhail Goloborodko <mgoloborodko gmail com> wrote:
Hi,
I need help on how to normalize and canonicalize XML.For example<?xml version="1.0" encoding="WINDOWS-1251"?><ed:N1 attr="4583001999" xmlns:ed="urn:ru:ed:v2.0"></ed:N>
I need to get
<n1:N1 xmlns:n1="urn:cbr-ru:ed:v2.0" attr="4583001999"></n1:N1>
And for
<?xml version="1.0" encoding="WINDOWS-1251"?><N1 attr="4583001999" xmlns="urn:ru:ed:v2.0"><N2 attr="value"></N2></N1>
I need
<n1:N1 attr="4583001999" xmlns="urn:ru:ed:v2.0"><n1:N2 attr="value"></n1:N2></n1:N1>
In other words I need to remove whitespaces and rewrite namespace prefixesI usestring src;xmlChar * canon;xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr, nullptr, XML_PARSE_NOBLANKS);int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, & canon);
It clearly removes whitespace, need help with namespace prefix rewrite.
Thank you in advance.
Mikhail
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml gnome org https://mail.gnome.org/ mailman/listinfo/xml
-- Eric S. Eberhard VICS 2933 W Middle Verde Road Camp Verde, AZ 86322 928-567-3727 work 928-301-7537 cell http://www.vicsmba.com/index.html (our work) http://www.vicsmba.com/ourpics/index.html (fun pictures)