[xml] Approach for parsing HTML file or URL



Hi. All.

I would like to parse html and see the content of html attributes in
each tag. For example,

If I have this sentence, <a href="http://www.w3schools.com/";>Visit
W3Schools!</a>, then I want to

see that Under the tag, "a", "href" attribute has
"http://www.w3schools.com/"; content, and anchor text is Visit
W3Schools!.

I have created it, and have used "htmlreadfile" to parse a html and
xpath to see each node.

In http://xmlsoft.org/examples/xpath1.c, I could see print_xpath_nodes
function to see each node.

Anyhow, I implemented parsing and checking node of parsed trees as I expected.

However, I doubt if this is a normal way to do it. Using htmlreadfile
function is quite obvious, but I guess

there is another way to see each node of parsed tree instead of using Xpath.

Does anybody knows it?

Thanks.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]