[xml] Html from Libxml2 Versus Html from Web browser



Hi. all.

I realized that For the same URL, html data from libxml2 and webbrowser(firefox) is different to each other.

It is http://www.amazon.com/s/ref=nb_ss_gw?url="">

Here is two partial html data, which represent the same part.

1. From WebBrowser(FireFox)

<div class="productTitle"><a href="" href="http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1?ie=UTF8&s=electronics&qid=1249401054&sr=8-1">http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1?ie=UTF8&s=electronics&qid=1249401054&sr=8-1
"> Garmin nüvi 255W 4.3-Inch Widescreen Portable GPS Navigator</a> <span class="ptBrand">by Garmin</span></div>
<div class="newPrice"><a href="" href="http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1?ie=UTF8&s=electronics&qid=1249401054&sr=8-1">http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1?ie=UTF8&s=electronics&qid=1249401054&sr=8-1">Buy new</a>:&nbsp;<strike>$329.99</strike> <span>$170.94</span></div>
<div class="usedPrice"><a href="" href="http://www.amazon.com/gp/offer-listing/B0015EWMX8/ref=sr_1_olp_1?ie=UTF8&s=electronics&qid=1249401054&sr=8-1">http://www.amazon.com/gp/offer-listing/B0015EWMX8/ref=sr_1_olp_1?ie=UTF8&s=electronics&qid=1249401054&sr=8-1">73 Used &amp; new</a> from <span>$139.00</span></div>

2.From Libxml2

<table class="n2" border="0" cellpadding="0" cellspacing="0" width="100%"><tr>
<td class="imageColumn" width="123"><table border="0" cellpadding="0" cellspacing="0"><tr>
<td align="center" width="115"><a href="" href="http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1">http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1"> <img src="" href="http://ecx.images-amazon.com/images/I/21IKwg51IyL._SL160_AA115_.jpg">http://ecx.images-amazon.com/images/I/21IKwg51IyL._SL160_AA115_.jpg" class="" border="0" alt="Product Details" width="115" height="115"></a></td>
<td width="8"></td></tr></table></td>
<td class="dataColumn"><table cellpadding="0" cellspacing="0" border="0">
<tr><td>
<a href="" href="http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1">http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1"><span class="srTitle">Garmin nüvi 255W 4.3-Inch Widescreen Portable GPS Navigator</span></a>by Garmin</td></tr>
<tr><td class="priceBlockWithTopPadding">
<span class="priceType"><a href="" href="http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1">http://www.amazon.com/Garmin-255W-4-3-Inch-Widescreen-Navigator/dp/B0015EWMX8/ref=sr_1_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1">Buy new</a>:</span> <span class="listprice">$329.99</span> <span class="saleprice">$170.94</span>    <span class="usedAndNewPriceBlock"><span class="priceType">
<a href="" href="http://www.amazon.com/gp/offer-listing/B0015EWMX8/ref=sr_1_olp_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1">http://www.amazon.com/gp/offer-listing/B0015EWMX8/ref=sr_1_olp_1/176-1451973-2375105?ie=UTF8&amp;qid=1249400810&amp;sr=8-1">73 Used &amp; new</a>from <span class="otherprice">$139.00</span></span></span></td></tr>


"Garmin nüvi 255W 4.3-Inch Widescreen Portable GPS Navigator", is an anchor text of the tag, "a" from 1, but it is an anchor text of the tag, "span" from 2.

Where does this difference come from? Why was the association between tag and anchor text changed?

Does anybody answer to this question?

Thanks.


 





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]