[xml] Approach for parsing HTML file or URL

From: Brian Kim <09su research gmail com>
To: xml gnome org
Subject: [xml] Approach for parsing HTML file or URL
Date: Mon, 3 Aug 2009 12:34:19 -0400

Hi. All.

I would like to parse html and see the content of html attributes in
each tag. For example,

If I have this sentence, <a href="http://www.w3schools.com/";>Visit
W3Schools!</a>, then I want to

see that Under the tag, "a", "href" attribute has
"http://www.w3schools.com/"; content, and anchor text is Visit
W3Schools!.

I have created it, and have used "htmlreadfile" to parse a html and
xpath to see each node.

In http://xmlsoft.org/examples/xpath1.c, I could see print_xpath_nodes
function to see each node.

Anyhow, I implemented parsing and checking node of parsed trees as I expected.

However, I doubt if this is a normal way to do it. Using htmlreadfile
function is quite obvious, but I guess

there is another way to see each node of parsed tree instead of using Xpath.

Does anybody knows it?

Thanks.

Follow-Ups:
- Re: [xml] Approach for parsing HTML file or URL
  - From: Michael Ludwig

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]