Re: [xml] remove node from html document
- From: andrew james <andrew systemssingular com>
- Cc: libxml gnome <xml gnome org>
- Subject: Re: [xml] remove node from html document
- Date: Wed, 17 Feb 2010 13:41:14 -0500
Csaba Raduly wrote:
On Wed, Feb 17, 2010 at 8:44 AM, andrew james
<andrew systemssingular com> wrote:
I have tried to xmlUnlinkNode, the result is that the loop through all nodes
is stopped at the unlinked node.
What is the reason for that stop?
How are you looping through the nodes? Are you sure you are not using
the unlinked node to determine the next node to process?
I thought to try other methods like copy the whole document
ptr (couldnt even write that code) and more like your
solution tried to repoint the nodeptr to the next, prev,
parent all after the unlink, as you know, that failed.
The unlink was at current node, then current was used to
continue the process. I had not copied the next nodeptr
before unlink then repointed the current to the saved copy.
// Code like this is wrong:
xmlNodePtr node = first;
while (node != null) {
if (some condition) {
xmlUnlinkNode(node);
}
node = node->next; // ERROR! node is already unlinked and next may
not be valid
}
What you need to do is to save the "next" node _before_ unlink:
xmlNodePtr node = first;
while (node != null) {
xmlNodePtr nextNode = node->next;
if (some condition) {
xmlUnlinkNode(node);
}
node = nextNode;
}
Csaba
Csaba added two lines, as your suggestion. The program works!
someone may have use for the code, as it was my first
program with libxml2, and I had found no tutorial how to
less a node from an html document.
here is the program, in comments find references to the
programs where code was sourced
/*
===license public
===authors
2002, 2003 John Fleck http://www.xmlsoft.org/tutorial/
20091203 Laurent Parenteau
http://laurentparenteau.com/blog/2009/12/parsing-xhtml-in-c-a-libxml2-tutorial/
20100217 andrew swinamer
edit-gtk-doc-html-01.c
===brief program
parses an html gtk-doc generated file to less nodes 'table
class navigation' and 'div class footer'
the intent was to practice work with document structures as
nodes in libxml2
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
#include <libxml/HTMLparser.h>
void editDocument(xmlNodePtr cur) {
xmlUnlinkNode(cur);
//xmlFreeNode(cur);
return;
}
void editNodes(xmlNodePtr cur) {
xmlNodePtr node = NULL;
for (node = cur; node; node = node->next) {
printf("at %s\n",node->name);
if ( (!xmlStrcmp(node->name, (const xmlChar *) "table"))
|| (!xmlStrcmp(node->name, (const xmlChar *) "div")) ) {
printf("found %s\n",node->name);
xmlChar *nodeAttr = xmlGetProp(node, (const xmlChar
*)"class");
if ( (!xmlStrcmp(nodeAttr, (const xmlChar *) "navigation"
)) || (!xmlStrcmp(nodeAttr, (const xmlChar *) "footer" )) ) {
printf("found %s class %s\n", node->name, nodeAttr);
xmlNodePtr nodeNext = node->next;
// less unwanted nodes from document
editDocument(node);
node = nodeNext;
} // if node attribute
} // if node name
editNodes(node->children);
}
return;
}
xmlDocPtr parseDoc(char *docname) {
xmlDocPtr doc;
xmlNodePtr cur;
doc = htmlParseFile(docname, NULL);
// err at parser
if (doc == NULL) {
fprintf(stderr,"Document was parsed unsuccessfully\n");
return;
}
// at html
cur = xmlDocGetRootElement(doc);
// errs at document
if (cur == NULL) {
fprintf(stderr,"Document is empty\n");
xmlFreeDoc(doc);
return;
}
if (xmlStrcmp(cur->name, (const xmlChar *) "html")) {
fprintf(stderr,"Document is not html");
xmlFreeDoc(doc);
return;
}
// loop read nodes to edit
editNodes(cur);
return(doc);
}
int main(int argc, char **argv) {
const char *docnameEdited;
char *docname;
xmlDocPtr doc;
if (argc <= 1) {
printf("Usage: %s docname docnameEdited\n", argv[0]);
return(0);
}
docname = argv[1];
docnameEdited = argv[2];
doc = parseDoc(docname);
if (doc != NULL) {
htmlSaveFileFormat(docnameEdited, doc, NULL, 1);
xmlFreeDoc(doc);
}
return (1);
}
sample document encode UTF-8
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<title>Compiling the GLib package</title>
<meta name="generator" content="DocBook XSL Stylesheets
V1.75.2">
</head>
<body bgcolor="white" text="black" link="#0000FF"
vlink="#840084" alink="#0000FF">
<table class="navigation" id="top" width="100%"
summary="Navigation header" cellpadding="2"
cellspacing="2"><tr valign="middle">
<td><a accesskey="p" href="glib.html"><img src="left.png"
width="24" height="24" border="0" alt="Prev"></a></td>
<td><a accesskey="u" href="glib.html"><img src="up.png"
width="24" height="24" border="0" alt="Up"></a></td>
<td><a accesskey="h" href="index.html"><img src="home.png"
width="24" height="24" border="0" alt="Home"></a></td>
<th width="100%" align="center">GLib Reference Manual</th>
<td><a accesskey="n" href="glib-cross-compiling.html"><img
src="right.png" width="24" height="24" border="0"
alt="Next"></a></td>
</tr></table>
<div class="refentry" title="Compiling the GLib package">
<a name="glib-building"></a><div class="titlepage"></div>
<div class="refnamediv"><table width="100%"><tr>
<td valign="top">
<h2><span class="refentrytitle">Compiling the GLib
package</span></h2>
<p>Compiling the GLib Package ÃÂÂ
How to compile GLib itself
</p>
</td>
<td valign="top" align="right"></td>
</tr></table></div>
<div class="refsect1" title="Building the Library on UNIX">
<a name="building"></a><h2>Building the Library on UNIX</h2>
<p>
On UNIX, GLib uses the standard GNU build system,
using <span class="application">autoconf</span> for
package
configuration and resolving portability issues,
<span class="application">automake</span> for
building makefiles
that comply with the GNU Coding Standards, and
<span class="application">libtool</span> for
building shared
libraries on multiple platforms. The normal
sequence for
compiling and installing the GLib library is thus:
</p>
</div>
<div class="footer">
<hr>
Generated by GTK-Doc V1.12</div>
</body>
</html>
thanks again
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]