[xml] Beginner Question : How to parse Html ? how to complete this code fraction ?



Hello
In search for Html parser i found that this (libXml2) can do Html parsing
i found only one Code example on how to use this section of the lib but its not complete
i need help to complete it for me to understand the API.
from this site :
http://laurentparenteau.com/blog/2009/12/parsing-xhtml-in-c-a-libxml2-tutorial/
Here is the code its compiles just fine but im missing the logic of how to open Html file and how to read it  :

void walkTree(xmlNode * a_node)
{
  xmlNode *cur_node = NULL;
  xmlAttr *cur_attr = NULL;
  for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
     // do something with that node information, like… printing the tag’s name and attributes
    printf("Got tag : %s\n", cur_node->name);
    for (cur_attr = cur_node->properties; cur_attr; cur_attr = cur_attr->next) {
        printf("  -> with attribute : %s\n", cur_attr->name);
    }
    walkTree(cur_node->children);
  }
 }
int  main(int argc, char** argv[])
{
   
    htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, xmlCharEncoding::XML_CHAR_ENCODING_NONE);
    htmlCtxtUseOptions(parser, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
    char * data; //: buffer containing part of the web page
    int len ;//: number of bytes in data
    // Last argument is 0 if the web page isn’t complete, and 1 for the final call.
    htmlParseChunk(parser, data, len, 0);
   
    walkTree(xmlDocGetRootElement(parser->myDoc));
    return 0;
}

can you please help me to complete the code?
Thanks



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]