[xml] Parsing a file that I didn't create
- From: "Jeffrey Bigham" <jbigham u washington edu>
- To: xml gnome org
- Subject: [xml] Parsing a file that I didn't create
- Date: Sat, 14 Oct 2006 11:04:02 -0700
Hello,
I'd like to use libxml to parse documents on the web that I didn't
create. Some of these are malformed according to the standard, and,
unfortunately, I can't do anything about that. For example, yahoo.com
contains the following piece of code:
<script language=javascript>
if(typeof(YAHOO)!='undefined') {
document.write('<map name="yodel"><area shape="rect"
coords="209,30,216,39" href="http://www.yahoo.com"
onclick="callYodel();return false;"><area shape="poly"
coords="211,0,222,1,215,26,211,25" href="http://www.yahoo.com"
onclick="callYodel();return false;"></map><div id=l_fl
style="position:absolute"></div>');
var
lr0='http://us.ard.yahoo.com/SIG=12ldjm870/M=386734.8419383.10128039.81613...
var lcap=0,lncap=0,ad_jsl=0,lnfv=6,ylmap=0;
var ldir="http://us.i1.yimg.com/us.yimg.com/i/mntl/ww/06q3/";
var swfl1=ldir+"yodel.swf";
var swflw=1,swflh=1;
}
...
</script>
libxml correctly messes this up because the closing HTML tags between
the </script> tags aren't correctly written as <\/name>. Is there a
way to use libxml (I'm currently using the SAX parser) without having
it try to fix things for me? If not, is there another C library that
people know of that can just return each tag to me, one at a time,
without enforcing adherence to the standard?
Thanks,
Jeff
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]