[xml] question about parsing html files

From: Vincent Torri <Vincent Torri iecn u-nancy fr>
To: xml gnome org
Subject: [xml] question about parsing html files
Date: Sat, 19 Feb 2005 11:52:04 +0100 (CET)


sorry for a possible double post, i've forget the subject.

hello,

i'm parsing an html file which contains, in the body, this code :

  <div>
    TOC <em>emphasied text</em> and <strong>strong text</strong>
  </div>

I parse it in the following way:

  xmlNodePtr node;
  xmlChar *name;
  for (node = body_node ; node ; node = node->next){
    if (xmlStrcasecmp (node->name, "div") == 0){
      name = xmlNodeListGetString (file, node->xmlChildrenNode, 1);
      if (name) printf ("%s\n", (char *)name);
    }
  }

this code displays

    TOC  and

that is, name is string containing 'TOC' and 'and'. Hence, i can't display
the emphasied and strong strings before and after 'and'.

is there a way to modify the code above so that i can retrieve 'TOC' and
'and' separately ?

thank you

Vincent Torri

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]