[xml] Patch for HTMLparser



Below is a minor patch for HTMLparser.c:

1. Handle hex character entities like ģ, ie. a capital X.

2. Skip to the end of misplaced <body> start tags. Currently any attributes
of a misplaced <body> are parsed as text and included as a <p> element in
the tree.

James


Index: HTMLparser.c
===================================================================
RCS file: /cvs/gnome/gnome-xml/HTMLparser.c,v
retrieving revision 1.167
diff -d -u -3 -r1.167 HTMLparser.c
--- HTMLparser.c        31 Oct 2003 10:36:02 -0000      1.167
+++ HTMLparser.c        20 Nov 2003 18:11:57 -0000
@@ -2880,7 +2880,7 @@
     int val = 0;
 
     if ((CUR == '&') && (NXT(1) == '#') &&
-        (NXT(2) == 'x')) {
+        ((NXT(2) == 'x') || NXT(2) == 'X')) {
        SKIP(3);
        while (CUR != ';') {
            if ((CUR >= '0') && (CUR <= '9')) 
@@ -3253,6 +3253,8 @@
                htmlParseErr(ctxt, XML_HTML_STRUCURE_ERROR,
                             "htmlParseStartTag: misplaced <body> tag\n",
                             name, NULL);
+               while ((IS_CHAR_CH(CUR)) && (CUR != '>'))
+                   NEXT;
                return;
            }
        }




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]