Hi, I'm trying to parse bare.txt (attached, yes it is simply cnn.com). For this purpose I'm using parse.c (also attached). The output is output.txt (Attachment!). If you look at bare.txt, you see a <script> block from line 826 to line 886. Now if you look at output.txt, you see the <script>-Tag in line 759, but the end-Tag (</script>) is in line 784; the problem is, that this end-Tag is in the middle of the javascript-code, which is actually bad :( I hope, you understood what the problem is, if not, don't hesitate to ask via the list or direct(if you want). Thx for your help
Attachment:
bare.txt
Description: Text document
Attachment:
output.txt
Description: Text document
Attachment:
parse.c
Description: Text Data