Re: [xml] Remove whitespaces from text nodes
- From: Georges-André SILBER <gasilber luxia fr>
- To: spam spam spam spam free fr
- Cc: xml gnome org
- Subject: Re: [xml] Remove whitespaces from text nodes
- Date: Thu, 16 Feb 2012 09:03:41 -0000
OK, but in this case it really depends on your input XML format and what you consider "useless".
If you only have "locally" useless whitespaces like here:
<txt> Some text </txt>
and you want to get <txt>Some text</txt> you can still use the function below and "strip" every text node
with a C function (I don't think that a standard C function exists for that).
But, if you have mixed content like:
<p> Some text<b><i> in bold italics </i></b>and continuing</p>
it is tricky to define "useless" whitespaces in a recursive descent because the decision is not local (here,
clearly, the whitespaces in the <i> elements should not be removed).
So, depending on your definition of "useless", the difficulty of the answer can range from very simple to
very complicated.
Best regards,
Georges-André SILBER
LUXIA
Le 16 févr. 2012 à 09:32, spam spam spam spam free fr a écrit :
I think this function removes blank nodes.
That's not exactly what I want.
I want to strip useless whitespaces from text nodes.
These nodes aren't considered as blank nodes because they contains also visible characters.
----- Mail original -----
De: "Georges-André SILBER" <gasilber luxia fr>
À: "spam spam spam spam" <spam spam spam spam free fr>
Cc: xml gnome org
Envoyé: Jeudi 16 Février 2012 09:14:17
Objet: Re: [xml] Remove whitespaces from text nodes
Hi,
I wrote a small function for this purpose some time ago.
I didn't test it with the last versions of libxml2 nor did I ensure that this code if correct but it gives
you the idea of a method that you can use to remove blank nodes.
The usage is for instance:
doc = xmlReadFile (xmlfile, NULL, 0);
if (doc == NULL)
{
/* Deal with error... */
return 1;
}
glbRemoveBlankNodes (xmlDocGetRootElement(doc));
Hope this helps,
Best regards,
Georges-André SILBER
LUXIA
int
glbRemoveBlankNodes (xmlNodePtr n)
{
xmlNodePtr cur;
xmlNodePtr next;
if (n == NULL)
return 0;
cur = n->children;
while (cur)
{
next = cur->next;
if (xmlIsBlankNode (cur))
{
xmlUnlinkNode (cur);
xmlFreeNode (cur);
}
else
glbRemoveBlankNodes (cur);
cur = next;
}
return 0;
}
Le 16 févr. 2012 à 08:57, spam spam spam spam free fr a écrit :
Yes you are right.
But I am not sure my function will do a good job.
I know 2 whitespaces : " ", "\t", ... But I am not sure that I know all of them.
My function will probably forgot to strip some whitespaces...
This is the reason why I would like to use an already defined function.
Is there a function which do this job?
----- Mail original -----
De: "Liam R E Quin" <liam holoweb net>
À: "spam spam spam spam" <spam spam spam spam free fr>
Cc: xml gnome org
Envoyé: Jeudi 16 Février 2012 08:40:31
Objet: Re: [xml] Remove whitespaces from text nodes
On Thu, 2012-02-16 at 08:28 +0100, spam spam spam spam free fr wrote:
[...].
Anyway, there seems to have no other solution with libxml2 only.
The spaces are part of the text of the document, so it's not likely that
a conformant XML parser will strip them for you.
You could of course remove the spaces in C after parsing, just as if you
decided to remove every occurrence of an upper-case "B" from the input.
That's just standard C string processing.
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]