Re: [xml] Remove whitespaces from text nodes



OK, but in this case it really depends on your input XML format and what you consider "useless".

If you only have "locally" useless whitespaces like here:

<txt>     Some text </txt>

and you want to get <txt>Some text</txt> you can still use the function below and "strip" every text node 
with a C function (I don't think that a standard C function exists for that).

But, if you have mixed content like:

<p>  Some text<b><i> in bold italics </i></b>and continuing</p>

it is tricky to define "useless" whitespaces in a recursive descent because the decision is not local (here, 
clearly, the whitespaces in the <i> elements should not be removed).

So, depending on your definition of "useless", the difficulty of the answer can range from very simple to 
very complicated.

Best regards,

Georges-André SILBER
LUXIA

Le 16 févr. 2012 à 09:32, spam spam spam spam free fr a écrit :

I think this function removes blank nodes.
That's not exactly what I want.
I want to strip useless whitespaces from text nodes.
These nodes aren't considered as blank nodes because they contains also visible characters.

----- Mail original -----
De: "Georges-André SILBER" <gasilber luxia fr>
À: "spam spam spam spam" <spam spam spam spam free fr>
Cc: xml gnome org
Envoyé: Jeudi 16 Février 2012 09:14:17
Objet: Re: [xml] Remove whitespaces from text nodes

Hi,

I wrote a small function for this purpose some time ago.
I didn't test it with the last versions of libxml2 nor did I ensure that this code if correct but it gives 
you the idea of a method that you can use to remove blank nodes.

The usage is for instance:

doc = xmlReadFile (xmlfile, NULL, 0);
if (doc == NULL)
  {
      /* Deal with error... */
     return 1;
   }
glbRemoveBlankNodes (xmlDocGetRootElement(doc));

Hope this helps,

Best regards,

Georges-André SILBER
LUXIA

int
glbRemoveBlankNodes (xmlNodePtr n)
{
 xmlNodePtr cur;
 xmlNodePtr next;

 if (n == NULL)
   return 0;

 cur = n->children;
 while (cur)
   {
     next = cur->next;      
     if (xmlIsBlankNode (cur))
      {
        xmlUnlinkNode (cur);
        xmlFreeNode (cur);
      }
     else
      glbRemoveBlankNodes (cur);
     cur = next;
   }

 return 0;
}


Le 16 févr. 2012 à 08:57, spam spam spam spam free fr a écrit :

Yes you are right.
But I am not sure my function will do a good job.
I know 2 whitespaces : " ", "\t", ... But I am not sure that I know all of them.
My function will probably forgot to strip some whitespaces...
This is the reason why I would like to use an already defined function.

Is there a function which do this job?

----- Mail original -----
De: "Liam R E Quin" <liam holoweb net>
À: "spam spam spam spam" <spam spam spam spam free fr>
Cc: xml gnome org
Envoyé: Jeudi 16 Février 2012 08:40:31
Objet: Re: [xml] Remove whitespaces from text nodes

On Thu, 2012-02-16 at 08:28 +0100, spam spam spam spam free fr wrote:
[...].
Anyway, there seems to have no other solution with libxml2 only.

The spaces are part of the text of the document, so it's not likely that
a conformant XML parser will strip them for you.

You could of course remove the spaces in C after parsing, just as if you
decided to remove every occurrence of an upper-case "B" from the input.

That's just standard C string processing.

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]