Re: [xml] Need help on normalization/canonicalization with namespace prefix rewrite



Eric,

Thank you!
I modified some functions in c14n.c to meet my task.

Mikhail

On Mon, Jan 29, 2018 at 10:31 PM, Eric S. Eberhard <eric vicsmba com> wrote:
My experience is that many surprising companies don't do correct XML.  Walmart for example does not put white space between attributes:

attr1="value1"attr2="value2"

I have one company that insists on one line (to CR or LF) for the entire XML (hard to read and debug).  Another won't take lines longer than 1024.

Once can spend all decade trying to Fedex to change, Walmart to change, all year changing libxml2, etc.

I went another way.  I have a pre and post filter on my XML that makes it as I want it.

So when I get XML I run it through the pre-filter and change as needed (based on a libxml2 setup XML file!) and when I send it I run it through a post-filter and change as needed also based on libxml2 setup XML file.

One can be "right" and waste more time than it is worth ... I dare anyone to try and "fix" Fedex or Walmart who insists they are correct ... or dozens of other companies.  It is easier to just fix it your self, and then process it.

This also has the advantage that the processing code is "clean" -- since your input and output will be standard (whatever you chose) ... only the pre/post filters may need work.  Of course you need only filter the goofballs (99% of our stuff goes through as is).

Another advantage is the pre/post filters don't have to use libxml2 at all (mine don't) -- read through them "by hand" with whatever language you use and you can change tags, namespaces, anything you like in any direction.  Just make sure the result is valid!

Just an idea but libxml2 cannot do everything for everyone -- especially when huge corporations do as they please.  It is a tool kit, not a complete car ... if you want to rebuild the engine, get the tools out, use them, and build your engine ... don't expect the tools to do everything.  Instead, just depend on them having all the tools which so far (as a user from way back when it was libxml.a) it has always worked and been reliable and fast.

Eric

On 1/28/2018 4:19 AM, Mikhail Goloborodko wrote:
Hi All,

I will appreciate if somebody could help on how to normalize and canonicalize XML.

For example
<?xml version="1.0" encoding="WINDOWS-1251"?>
<ed:N1 attr="4583001999" xmlns:ed="urn:ru:ed:v2.0">
</ed:N>

I need to get

<n1:N1 xmlns:n1="urn:cbr-ru:ed:v2.0" attr="4583001999"></n1:N1>

And for

<?xml version="1.0" encoding="WINDOWS-1251"?>
<N1 attr="4583001999" xmlns="urn:ru:ed:v2.0">
  <N2 attr="value"></N2>
</N1>

I need

<n1:N1 xmlns:n1="urn:ru:ed:v2.0"> attr="4583001999"<n1:N2 attr="value"></n1:N2></n1:N1>

In other words I need to remove whitespaces and rewrite namespace prefixes
I use
string src;
xmlChar * canon;
xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr, nullptr, XML_PARSE_NOBLANKS);
int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, & canon);

It removes whitespaces, need help with namespace prefix rewrite.

Thank you in advance.

On Sun, Jan 28, 2018 at 12:41 AM, Mikhail Goloborodko <mgoloborodko gmail com> wrote:
Hi,

I need help on how to normalize and canonicalize XML.
For example
<?xml version="1.0" encoding="WINDOWS-1251"?>
<ed:N1 attr="4583001999" xmlns:ed="urn:ru:ed:v2.0">
</ed:N>

I need to get

<n1:N1 xmlns:n1="urn:cbr-ru:ed:v2.0" attr="4583001999"></n1:N1>

And for

<?xml version="1.0" encoding="WINDOWS-1251"?>
<N1 attr="4583001999" xmlns="urn:ru:ed:v2.0">
  <N2 attr="value"></N2>
</N1>

I need

<n1:N1 attr="4583001999" xmlns="urn:ru:ed:v2.0"><n1:N2 attr="value"></n1:N2></n1:N1>

In other words I need to remove whitespaces and rewrite namespace prefixes
I use
string src;
xmlChar * canon;
xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr, nullptr, XML_PARSE_NOBLANKS);
int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, & canon);

It clearly removes whitespace, need help with namespace prefix rewrite.

Thank you in advance.

Mikhail 


_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml gnome org https://mail.gnome.org/mailman/listinfo/xml

-- 
Eric S. Eberhard
VICS
2933 W Middle Verde Road
Camp Verde, AZ  86322

928-567-3727  work                      928-301-7537  cell

http://www.vicsmba.com/index.html             (our work)
http://www.vicsmba.com/ourpics/index.html     (fun pictures)



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]