Re: [xml] iterating through an XML document?



Hi,

  In general no. Please do not try to assume you will be able to get
libxml2 to ignore data. This may work or not, and the DTD is usually not a
garantee because document are usually not valid. Instead of trying to build
a dangerous pile of assumtion to try to avoid processing a few nodes,
please code the full algorithm, and skip those nodes there. You will avoid
wasting a lot of time on design, coding, testing and when your users
actually start to use the code. It's not like testing if a node is text and
just white spaces is hard so what ???

thanks for your hints.  Ok, you convince me easily, of course i want to
write proper code without any assumptions that at some point break my code.

Also, as an inbetween solution i tried to iterate over the document (already
loaded) and remove those parts that are text nodes that just contain
white-spaces.

It seems to me that having a loop over some node->children and removing
some of them in that same loop is somehow not a good idea, at least glibc
aborts my program due to double-freeing memory.  So i had to program it
like this:


At the moment the relevant part of my program looks like below, basically now
i iterate recursively through the nodes, first the text nodes.  If i find an
empty one i remove it and start over (by returning 1, the caller will repeat
the recursive call).  When all empty text nodes are removed i iterate over
the element nodes and iterate over their children.  At startup i call
remove_empty(root_node).


int do_remove_empty(xmlNode* node) {
  xmlNode* n;
  int i, is_empty, len;
  xmlChar* val;

  for(n = node; n; n = n->next) {
    if(n->type == XML_TEXT_NODE) {
      val = xmlNodeGetContent(n);
      len = strlen((const char*)val);

      is_empty = 1;
      for(i = 0; i < len; i++) {
        printf("%02X ", val[i]);
        if(!isspace(val[i])) {
          is_empty = 0;
        }
      }
      printf("\n");
      xmlFree(val);

      if(is_empty) {
        printf("unlinking %p\n", n);
        xmlUnlinkNode(n);
        xmlFreeNode(n);
        return 1;
      }
    }
  }

  for(n = node; n; n = n->next) {
    if(n->type == XML_ELEMENT_NODE) {
      do {
      }while(do_remove_empty(n->children));
    }
  }

  return 0;
}


void remove_empty(xmlNode* node) {
  int s;

  do {
    s = do_remove_empty(node);
  } while(s);
}



void show(xmlNode* node, int indent) {
  xmlNode* n;
  int i;
  xmlAttr* attr;
  xmlChar* ac;
  xmlChar* val;

  for(n = node; n; n = n->next) {
    if(n->type == XML_ELEMENT_NODE) {
      for(i = 0; i < indent; i++) printf(" ");
      printf("<%s>\n", n->name);
      attr = n->properties;
      while(attr) {
        ac = xmlGetProp(n, attr->name);
        for(i = 0; i < indent+2; i++) printf(" ");
        printf("<%s><%s>\n", attr->name, ac);
        xmlFree(ac);
        attr = attr->next;
      }
      show(n->children, indent+2);
    }
    else if(n->type == XML_TEXT_NODE) {
      for(i = 0; i < indent; i++) printf(" ");
      val = xmlNodeGetContent(n);
      printf("c:%i:<%s>\n", strlen((const char*)val), val);
      xmlFree(val);
    }
  }
}


Best regards,
Torsten.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]