From bugtrack@roumenpetrov.info Sat Oct 6 16:32:08 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id C0D417611B for ; Sat, 6 Oct 2018 16:32:08 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.6 X-Spam-Level: X-Spam-Status: No, score=-2.6 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yi7203oe6RsG for ; Sat, 6 Oct 2018 16:32:06 +0000 (UTC) Received: from rila.superhosting.bg (rila.superhosting.bg [91.196.125.212]) by smtp.gnome.org (Postfix) with ESMTPS id ACEBB760EF for ; Sat, 6 Oct 2018 16:32:05 +0000 (UTC) Received: from [78.128.48.21] (port=45952 helo=[192.168.0.10]) by rila.superhosting.bg with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from ) id 1g8pUm-0000Np-Kb for xml@gnome.org; Sat, 06 Oct 2018 19:32:02 +0300 From: Roumen Petrov To: "xml@gnome.org" References: <4d4c5c83-87f2-9ca9-c231-9e5c974ad31c@aevum.de> Message-ID: Date: Sat, 6 Oct 2018 19:32:00 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 In-Reply-To: <4d4c5c83-87f2-9ca9-c231-9e5c974ad31c@aevum.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-OutGoing-Spam-Status: No, score=-0.2 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - rila.superhosting.bg X-AntiAbuse: Original Domain - gnome.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roumenpetrov.info X-Get-Message-Sender-Via: rila.superhosting.bg: authenticated_id: master78@roumenpetrov.info X-Authenticated-Sender: rila.superhosting.bg: master78@roumenpetrov.info X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [xml] Serialization of documents without encoding X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2018 16:32:08 -0000 Hi Nick, Nick Wellnhofer wrote: > On 25/09/2018 14:36, Nick Wellnhofer wrote: >> The whole situation is a mess. I'd love to change the code so that >> non-ASCII chars are always encoded as UTF-8, but I'm scared to break >> things. Long time ago I did some test with html - http://roumenpetrov.info/tests/charset/ . The case is quite similar - encoding could be defined externally in HTTP header ... Content-Type: text/html; charset=ISO8859-5 ... and in the same time in HTML header (internal) ... ....   .... ... If I remember well (10-15 ago) Internet Explorer prefer internal while other browsers prefer external encoding. I create similar test to check what is situation with xml http://roumenpetrov.info/tests/charset/index-xml.html and dis some tests ( ( browsers - Firefox, Opera, Chromium, Konqueror ). The test show that all(1) browsers could read xml in following case : - HTTP header without charset, i.e. Content-Type: text/html; - XML prolog with encoding, i.e. Without encoding in prolog only file in UTF-8 codeset could be read (no surprise). Behavior of some browsers depend from file suffix . This is reason to test to use  .xml and .none suffixes. Mix between charset and encoding fail as expected exept in case charset=iso8859-1 where some browsers show properly content. Based on tests I think that switch to UTF-8 encoded content by default is good to have encoding in prolog. It is less risky. > This is the change I have in mind: > > https://github.com/nwellnhof/libxml2/commit/53551ec2f6a2ef03bfcfb6d73b6fd18dc70ba15d Ok to remove "Special escaping routines" but patch shows that in regression tests prolog remains as "". I'm not sure that such code modification is save. > Nick Regards, Roumen From now@disu.se Fri Oct 12 22:22:11 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id 43291760CE for ; Fri, 12 Oct 2018 22:22:11 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.002 X-Spam-Level: X-Spam-Status: No, score=-2.002 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kZlFVxXbjsmk for ; Fri, 12 Oct 2018 22:22:09 +0000 (UTC) Received: from disu.se (disu.se [71.19.156.204]) by smtp.gnome.org (Postfix) with ESMTPS id 43C07760CB for ; Fri, 12 Oct 2018 22:22:08 +0000 (UTC) Authentication-Results: auth=pass smtp.auth=now smtp.mailfrom=now@disu.se DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1539382928; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=pr6oA3zYeUx9Xg1rv6/Jhahdx7GPqljer74cNGSybfY=; b=l3pHB/vjFP+sH551uTcvnjhdpxuH8peq040PocAqBhdDrQvYpomYJ0tvxoPoQasSxZxOtl SCYVpo8eMoMtMvaS1vwJsWtvvlh31qDXhBkn9fcoMCGdxkr4qAFlpY5gh+yDm+E1bD9Te/ h05IpMnoWmPGeVBRlQX2DijUy6O0Ts8= Received: from now-macbook-pro.local (c-aa81e455.017-44-67626723.bbcust.telenor.se [85.228.129.170]) by disu.se (Postfix) with ESMTPSA id 0E6F0BFC1 for ; Sat, 13 Oct 2018 00:22:06 +0200 (CEST) References: From: Nikolai Weibull To: xml@gnome.org In-reply-to: Date: Sat, 13 Oct 2018 00:22:02 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; format=flowed ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1539382928; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=pr6oA3zYeUx9Xg1rv6/Jhahdx7GPqljer74cNGSybfY=; b=LZ63vv+2mYSJoaA26Zqm6+R9hQe7S1XbqoTqIvpisGezN+dFecCuJWWR4coSItxfzd9lo3 +G2uqpp7F4HR69WMNGiEFJDxMhV3GdsNWCdrEMzbgdCpqGGOjMKSOoKenbcxGJ/rrrO/uN GkAWUdTvT7S1/i7IHlChC4ULMxiHk34= ARC-Seal: i=1; s=mail; d=disu.se; t=1539382928; a=rsa-sha256; cv=none; b=Io4edBAY8CE1JVFCudDQQnCNFaMiXiYlR0KmAeciCRJ7f/zOVZ7C++4IhGDJ1ZIeqrCGBYudrYU4PmNU1Dp+qEf21L+XAvpBbIbadtQ3t5CbB5+5EYt4Ob1/9pMu0XiKYDEhrcyDpGpU7gnzaBzL7SrWQG6RmmNKwLXsxwpBU3o= ARC-Authentication-Results: i=1; auth=pass smtp.auth=now smtp.mailfrom=now@disu.se Subject: Re: [xml] Possibly incomplete step 4.7 of the RELAXNG simplification process X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Oct 2018 22:22:11 -0000 Hi! This has been fixed in merge request #10: https://gitlab.gnome.org/GNOME/libxml2/merge_requests/10 along with some other RELAX NG issues that I found as well. Nikolai Nikolai Weibull, 2018-09-07 20:49: > I think that something is broken in the way libxml2 handles step > 4.7 > of the RELAXNG simplification process. > > Say that we have the following RELAXNG grammars. > > a.rng: > > > > > > > > > > > > > > > b.rng: > > > > > > > > > > > > > > > c.rng: > > > > > > > > > > The grammar defined in a.rng (and its included files) is > accepted by > Jing as is. Libxml2 however, complains that > > a.rng:5: element notAllowed: Relax-NG parser error : Some > > element miss the combine attribute From now@disu.se Fri Oct 12 22:23:15 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id C3D56760CE for ; Fri, 12 Oct 2018 22:23:15 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.002 X-Spam-Level: X-Spam-Status: No, score=-2.002 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1jX8n1SIdbcg for ; Fri, 12 Oct 2018 22:23:14 +0000 (UTC) Received: from disu.se (disu.se [71.19.156.204]) by smtp.gnome.org (Postfix) with ESMTPS id E143C760CB for ; Fri, 12 Oct 2018 22:23:14 +0000 (UTC) Authentication-Results: auth=pass smtp.auth=now smtp.mailfrom=now@disu.se DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1539382994; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=FlY8G6abZlGyLPGB1YbfPydTk0bEr+WpPaKIwBgy+bY=; b=urSSWs1aS2qCIk0moK0u6Fxjoy4Z2NxGkEb8xanjKN1UNjH7qtXGiPcLXN8I4g1URhgYno pBfbu+HtRCThf0B8iarZEirdWWJaRQ2xX3QCLjAmkAQYMhx9dV1JdAa4w0sjRVBv4il/Yp 4Rt2T2+NxumlSlDgwP3vxlp9Os5i0IE= Received: from now-macbook-pro.local (c-aa81e455.017-44-67626723.bbcust.telenor.se [85.228.129.170]) by disu.se (Postfix) with ESMTPSA id 16CAEBFC1 for ; Sat, 13 Oct 2018 00:23:12 +0200 (CEST) References: From: Nikolai Weibull To: xml@gnome.org In-reply-to: Date: Sat, 13 Oct 2018 00:23:09 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1539382994; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=FlY8G6abZlGyLPGB1YbfPydTk0bEr+WpPaKIwBgy+bY=; b=pkkii8ftbV+vCW5mZBn+odzx/CqcP1RukndudG6qMgBYCAh14YmHNARIqsmLthTUfo24Vp RuHU3zjW9CyAbjY9uZtoVJLuprNbdwlG8tmYde/To/pERHAD6QRmKPcROt85aIzMzBiPzR LpRHU6D5vhvioQp73G3Pc8pJqzkdeqo= ARC-Seal: i=1; s=mail; d=disu.se; t=1539382994; a=rsa-sha256; cv=none; b=GJrGInJ4twFN80CHnyyewpSUzMrD99ounIdvPAXusf4HVsQpcqr7tUUw9kmmQu8AlwvLMIW5TENoyi36pEnsn4L7eI/ecf1MUABOaMe0cwaPY2y9e9aNjLQWvyYoQjI5+N81CGT8ZaBS4frZ3YGGFGnvZ0JN+24gSX2JFZYeMYI= ARC-Authentication-Results: i=1; auth=pass smtp.auth=now smtp.mailfrom=now@disu.se Subject: Re: [xml] Problem with data in interleave in RELAX NG validation X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Oct 2018 22:23:15 -0000 Hi! This remains unfixed. I have absolutely no idea what=E2=80=99s going on=20 in the interleave validation code. Daniel, could you please put=20 together some minor documentation on how the interleave validation=20 code works? It=E2=80=99s very complicated. Thank you, Nikolai Nikolai Weibull, 2018-09-09 21:26: > Hi! > > Given the following input RELAX NG grammar: > > datatypeLibrary=3D"http://www.w3.org/2001/XMLSchema-datatypes"> > > > > > > > > > > > and the following input document a.xml: > > c > > xmllint reports: > > a.xml:1: element a: Relax-NG validity error : Element a has=20 > extra > content: text > a.xml fails to validate > > Changing the interleave to a group solves the issue, so the=20 > problem is > with how interleaves are validated. > > I looked at xmlRelaxNGValidateInterleave() and I sadly have no=20 > idea > what=E2=80=99s going on. Please point me in the right direction and=20 > I=E2=80=99ll > gladly write a patch. > > Nikolai From now@disu.se Sun Oct 14 19:02:37 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id A0D687611A for ; Sun, 14 Oct 2018 19:02:37 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.002 X-Spam-Level: X-Spam-Status: No, score=-2.002 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tah076IAatKH for ; Sun, 14 Oct 2018 19:02:36 +0000 (UTC) Received: from disu.se (disu.se [71.19.156.204]) by smtp.gnome.org (Postfix) with ESMTPS id 0F63F76115 for ; Sun, 14 Oct 2018 19:02:35 +0000 (UTC) Authentication-Results: auth=pass smtp.auth=now smtp.mailfrom=now@disu.se DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1539543754; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=DKqjBzyCz3FlGjN+VwsoKyFobJP8y7e+plOSZAmGkCs=; b=VGmo18aJ8cROwBoBS1ogE/K5xIJCgg6Gz+RHeiONPIWT6ZHLAklzJBNSRRArzeS4UG62NG EmwDx/HF/NBeg++nIccEgfZw54GXPsw1v6NcOoTroDVYRpKxpvcTJFmqhqTM3teDpU9TV1 Eu4UdGKLr1ZPlJzbAbt94bCb7aSt+MI= Received: from now-macbook-pro.local (c-aa81e455.017-44-67626723.bbcust.telenor.se [85.228.129.170]) by disu.se (Postfix) with ESMTPSA id 070E3BFAC for ; Sun, 14 Oct 2018 21:02:32 +0200 (CEST) References: From: Nikolai Weibull To: xml@gnome.org In-reply-to: Date: Sun, 14 Oct 2018 21:02:29 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1539543754; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=DKqjBzyCz3FlGjN+VwsoKyFobJP8y7e+plOSZAmGkCs=; b=DQruAOOWNrLmH6adiE8Vf0QbSIvm8kLCG08S4CPHap5QO/U8xs4qyi7NlC660WhglHhDe/ 2akHPnR+ax/Vlx86vdvxU2mFJMAW8yAgxaT2YLvjtg3sfmDmggE+iMTFAYPPAoLOfznUr3 0FImL+1V7qu7kb3v3fO+EjoMTeAPHP0= ARC-Seal: i=1; s=mail; d=disu.se; t=1539543754; a=rsa-sha256; cv=none; b=KUuleLiLZ04iCSW44JQIeiwiH/He59IFtfo+fo1/xGTKE9sFxqsefjwm/GGXeZ2bO0uijivlZjW2ZEzbqbfL3LNQEEGAWBHXbEi2aklVtUGyIHK8foeX8RpuShD8KX0iaDQoBZf9lJ0H4rk7gzXRpeQUAFl7CRldMjDUafpp/rM= ARC-Authentication-Results: i=1; auth=pass smtp.auth=now smtp.mailfrom=now@disu.se Subject: Re: [xml] Problem with data in interleave in RELAX NG validation X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2018 19:02:37 -0000 Hi! OK, I managed to decode it somewhat. The issue seems to be that=20 we build groups of what can be matched by the interleave, but that=20 these groups don=E2=80=99t include data, list, and value elements, only=20 element and text elements. This patch extends=20 xmlRelaxNGGetElements so that it can return these elements for us=20 in xmlRelaxNGComputeInterleaves. Then we make sure to=20 updatexmlRelaxNGNodeMatchesList as well so that it accepts the=20 correct types. The testsuite passes and my test below does as well. I=E2=80=99m a bit surprised that interleaves simply wouldn=E2=80=99t allow = for=20 data, list, and value elements previously, so I=E2=80=99m wondering if=20 there was a reason for the code to be the way it was and that the=20 fix should be placed somewhere else or if it was simply an=20 oversight. Either way, this does seem to be the correct solution.=20 If someone could confirm that this solution is what we=E2=80=99re looking=20 for, I=E2=80=99ll add some proper test cases and apply another merge=20 request on git.gnome.org. Best regards, Nikolai --- relaxng.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/relaxng.c b/relaxng.c index 8306e546..3ed03ff4 100644 --- a/relaxng.c +++ b/relaxng.c @@ -3993,7 +3993,7 @@=20 xmlRelaxNGGenerateAttributes(xmlRelaxNGParserCtxtPtr ctxt, * xmlRelaxNGGetElements: * @ctxt: a Relax-NG parser context * @def: the definition definition - * @eora: gather elements (0) or attributes (1) + * @eora: gather elements (0), attributes (1) or elements and=20 text (2) * * Compute the list of top elements a definition can generate * @@ -4019,7 +4019,12 @@=20 xmlRelaxNGGetElements(xmlRelaxNGParserCtxtPtr ctxt, while (cur !=3D NULL) { if (((eora =3D=3D 0) && ((cur->type =3D=3D XML_RELAXNG_ELEMENT)=20 || (cur->type =3D=3D XML_RELAXNG_TEXT))) || - ((eora =3D=3D 1) && (cur->type =3D=3D=20 XML_RELAXNG_ATTRIBUTE))) { + ((eora =3D=3D 1) && (cur->type =3D=3D XML_RELAXNG_ATTRIBUTE))= =20 || + ((eora =3D=3D 2) && ((cur->type =3D=3D XML_RELAXNG_DATATYPE)=20 || + (cur->type =3D=3D XML_RELAXNG_ELEMENT)=20 || + (cur->type =3D=3D XML_RELAXNG_LIST) || + (cur->type =3D=3D XML_RELAXNG_TEXT) || + (cur->type =3D=3D XML_RELAXNG_VALUE))))=20 { if (ret =3D=3D NULL) { max =3D 10; ret =3D (xmlRelaxNGDefinePtr *) @@ -4374,7 +4379,7 @@ xmlRelaxNGComputeInterleaves(void *payload,=20 void *data, if (cur->type =3D=3D XML_RELAXNG_TEXT) is_mixed++; groups[nbgroups]->rule =3D cur; - groups[nbgroups]->defs =3D xmlRelaxNGGetElements(ctxt, cur,=20 0); + groups[nbgroups]->defs =3D xmlRelaxNGGetElements(ctxt, cur,=20 2); groups[nbgroups]->attrs =3D xmlRelaxNGGetElements(ctxt,=20 cur, 1); nbgroups++; cur =3D cur->next; @@ -9280,7 +9285,10 @@ xmlRelaxNGNodeMatchesList(xmlNodePtr node,=20 xmlRelaxNGDefinePtr * list) return (1); } else if (((node->type =3D=3D XML_TEXT_NODE) || (node->type =3D=3D XML_CDATA_SECTION_NODE)) && - (cur->type =3D=3D XML_RELAXNG_TEXT)) { + ((cur->type =3D=3D XML_RELAXNG_DATATYPE) || + (cur->type =3D=3D XML_RELAXNG_LIST) || + (cur->type =3D=3D XML_RELAXNG_TEXT) || + (cur->type =3D=3D XML_RELAXNG_VALUE))) { return (1); } cur =3D list[i++]; --=20 2.19.1 Nikolai Weibull, 2018-10-13 00:23: > Hi! > > This remains unfixed. I have absolutely no idea what=E2=80=99s going on= =20 > in > the interleave validation code. Daniel, could you please put=20 > together > some minor documentation on how the interleave validation code=20 > works? > It=E2=80=99s very complicated. > > Thank you, > > Nikolai > > Nikolai Weibull, 2018-09-09 21:26: > >> Hi! >> >> Given the following input RELAX NG grammar: >> >> > datatypeLibrary=3D"http://www.w3.org/2001/XMLSchema-datatypes"> >> >> >> >> >> >> >> >> >> >> >> and the following input document a.xml: >> >> c >> >> xmllint reports: >> >> a.xml:1: element a: Relax-NG validity error : Element a has=20 >> extra >> content: text >> a.xml fails to validate >> >> Changing the interleave to a group solves the issue, so the=20 >> problem >> is >> with how interleaves are validated. >> >> I looked at xmlRelaxNGValidateInterleave() and I sadly have no=20 >> idea >> what=E2=80=99s going on. Please point me in the right direction and=20 >> I=E2=80=99ll >> gladly write a patch. >> >> Nikolai From reedstrm@rice.edu Mon Oct 15 14:09:26 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id CE08876110 for ; Mon, 15 Oct 2018 14:09:26 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -0.701 X-Spam-Level: X-Spam-Status: No, score=-0.701 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KHOP_DYNAMIC=1.999, RCVD_IN_DNSWL_LOW=-0.7] autolearn=no Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3RJN90B2cIlX for ; Mon, 15 Oct 2018 14:09:25 +0000 (UTC) X-Greylist: delayed 910 seconds by postgrey-1.34 at restaurant.gnome.org; Mon, 15 Oct 2018 14:09:25 UTC Received: from mx0b-0010f301.pphosted.com (mx0b-0010f301.pphosted.com [148.163.153.244]) by smtp.gnome.org (Postfix) with ESMTPS id 68C99760CB for ; Mon, 15 Oct 2018 14:09:24 +0000 (UTC) Received: from pps.filterd (m0102859.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w9FDpWTm008314 for ; Mon, 15 Oct 2018 08:54:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice.edu; h=date : from : to : subject : message-id : references : mime-version : content-type : content-transfer-encoding : in-reply-to; s=ricemail; bh=QZEd+AxG2sQLpdiWQSa93SwVA3EOJjk5pBH52H4v+vU=; b=ISgj7XzMMCicWoV+osAXwc+jCp3lCDegAuOknBHrXkjXqEUp3t60rWnszpKRZt7r7Q6P Z91yNO8u2A6m2M8l4RDUfD/qu58Vm8GWvzeeutX3I9MZBx7UHQfeodVT4j7gU4m7aO1V UKznQ3JLcRCqwexfD4s6+/JBs7gEPR2feKIUhZMzHllwuC3gQBzHJiA5DdYr4pSOgT9l gCjikGBKv2hTlSolHbRRJr1iQyoUhMglXPNRyot+DtFdNzK3R0/PryZKf2c5+vKZhm8n p1M6cWbuSucXVlE+oxfPGUDYSFKaslBV3f9PrL6HwiaNTBS20u1zNTRqX6ksDWHyKe0C Cw== Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx0b-0010f301.pphosted.com with ESMTP id 2n3q7n1qp3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 Oct 2018 08:54:13 -0500 Received-X: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 7AB17401CF for ; Mon, 15 Oct 2018 08:54:13 -0500 (CDT) Received-X: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 7974F401C3 for ; Mon, 15 Oct 2018 08:54:13 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received-X: from mh3.mail.rice.edu ([127.0.0.1]) by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 5jB6t0Yn7YjG for ; Mon, 15 Oct 2018 08:54:13 -0500 (CDT) Received: from sensei.cnx.rice.edu (sensei.dyndns.rice.edu [10.192.71.14]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: reedstrm) by mh3.mail.rice.edu (Postfix) with ESMTPSA id 4EED8401AB for ; Mon, 15 Oct 2018 08:54:13 -0500 (CDT) Received: by sensei.cnx.rice.edu (Postfix, from userid 1000) id 28BF53240D2; Mon, 15 Oct 2018 08:54:13 -0500 (CDT) Date: Mon, 15 Oct 2018 08:54:13 -0500 From: Ross Reedstrom To: Nikolai Weibull via xml Message-ID: <20181015135413.GA806@sensei.cnx.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-10-15_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810150127 Subject: Re: [xml] Problem with data in interleave in RELAX NG validation X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2018 14:09:26 -0000 Nikolai - Glad to see someone attacking these. I've got some RNG schema that we've had to use jing to validate, since libxml2 was giving similiar issues to what you're seeing, and I was even more daunted by the code than you seem to be. If you could get this branch somewhere I can pull it down, I'd love to see if your fixes help my schema. Ross On Sun, Oct 14, 2018 at 09:02:29PM +0200, Nikolai Weibull via xml wrote: > Hi! > > OK, I managed to decode it somewhat. The issue seems to be that we build > groups of what can be matched by the interleave, but that these groups don’t > include data, list, and value elements, only element and text elements. > This patch extends xmlRelaxNGGetElements so that it can return these > elements for us in xmlRelaxNGComputeInterleaves. Then we make sure to > updatexmlRelaxNGNodeMatchesList as well so that it accepts the correct > types. > > The testsuite passes and my test below does as well. > > I’m a bit surprised that interleaves simply wouldn’t allow for data, list, > and value elements previously, so I’m wondering if there was a reason for > the code to be the way it was and that the fix should be placed somewhere > else or if it was simply an oversight. Either way, this does seem to be the > correct solution. If someone could confirm that this solution is what we’re > looking for, I’ll add some proper test cases and apply another merge request > on git.gnome.org. > > Best regards, > Nikolai > > --- > relaxng.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/relaxng.c b/relaxng.c > index 8306e546..3ed03ff4 100644 > --- a/relaxng.c > +++ b/relaxng.c > @@ -3993,7 +3993,7 @@ xmlRelaxNGGenerateAttributes(xmlRelaxNGParserCtxtPtr > ctxt, > * xmlRelaxNGGetElements: > * @ctxt: a Relax-NG parser context > * @def: the definition definition > - * @eora: gather elements (0) or attributes (1) > + * @eora: gather elements (0), attributes (1) or elements and text (2) > * > * Compute the list of top elements a definition can generate > * > @@ -4019,7 +4019,12 @@ xmlRelaxNGGetElements(xmlRelaxNGParserCtxtPtr ctxt, > while (cur != NULL) { > if (((eora == 0) && ((cur->type == XML_RELAXNG_ELEMENT) || > (cur->type == XML_RELAXNG_TEXT))) || > - ((eora == 1) && (cur->type == > XML_RELAXNG_ATTRIBUTE))) { > + ((eora == 1) && (cur->type == XML_RELAXNG_ATTRIBUTE)) || > + ((eora == 2) && ((cur->type == XML_RELAXNG_DATATYPE) || > + (cur->type == XML_RELAXNG_ELEMENT) || > + (cur->type == XML_RELAXNG_LIST) || > + (cur->type == XML_RELAXNG_TEXT) || > + (cur->type == XML_RELAXNG_VALUE)))) { > if (ret == NULL) { > max = 10; > ret = (xmlRelaxNGDefinePtr *) > @@ -4374,7 +4379,7 @@ xmlRelaxNGComputeInterleaves(void *payload, void > *data, > if (cur->type == XML_RELAXNG_TEXT) > is_mixed++; > groups[nbgroups]->rule = cur; > - groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur, > 0); > + groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur, 2); > groups[nbgroups]->attrs = xmlRelaxNGGetElements(ctxt, cur, > 1); > nbgroups++; > cur = cur->next; > @@ -9280,7 +9285,10 @@ xmlRelaxNGNodeMatchesList(xmlNodePtr node, > xmlRelaxNGDefinePtr * list) > return (1); > } else if (((node->type == XML_TEXT_NODE) || > (node->type == XML_CDATA_SECTION_NODE)) && > - (cur->type == XML_RELAXNG_TEXT)) { > + ((cur->type == XML_RELAXNG_DATATYPE) || > + (cur->type == XML_RELAXNG_LIST) || > + (cur->type == XML_RELAXNG_TEXT) || > + (cur->type == XML_RELAXNG_VALUE))) { > return (1); > } > cur = list[i++]; > -- > 2.19.1 > > > Nikolai Weibull, 2018-10-13 00:23: > > >Hi! > > > >This remains unfixed. I have absolutely no idea what’s going on in > >the interleave validation code. Daniel, could you please put together > >some minor documentation on how the interleave validation code works? > >It’s very complicated. > > > >Thank you, > > > > Nikolai > > > >Nikolai Weibull, 2018-09-09 21:26: > > > >>Hi! > >> > >>Given the following input RELAX NG grammar: > >> > >> >> datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >>and the following input document a.xml: > >> > >>c > >> > >>xmllint reports: > >> > >>a.xml:1: element a: Relax-NG validity error : Element a has extra > >>content: text > >>a.xml fails to validate > >> > >>Changing the interleave to a group solves the issue, so the problem > >>is > >>with how interleaves are validated. > >> > >>I looked at xmlRelaxNGValidateInterleave() and I sadly have no idea > >>what’s going on. Please point me in the right direction and I’ll > >>gladly write a patch. > >> > >> Nikolai > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ > xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Senior Developer https://cnx.org phone: 713-348-6166 OpenStax https://openstax.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE From now@disu.se Mon Oct 22 19:50:22 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id 6B63C7612E for ; Mon, 22 Oct 2018 19:50:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.002 X-Spam-Level: X-Spam-Status: No, score=-2.002 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9AGizWWwavr3 for ; Mon, 22 Oct 2018 19:50:21 +0000 (UTC) Received: from disu.se (disu.se [71.19.156.204]) by smtp.gnome.org (Postfix) with ESMTPS id 3D64776101 for ; Mon, 22 Oct 2018 19:50:20 +0000 (UTC) Authentication-Results: auth=pass smtp.auth=now smtp.mailfrom=now@disu.se DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1540237819; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=BeFGJrHPvFYNknbkGatRex731Y/R2m1wfy9czuM72i8=; b=Lmgl2yBccUhS1iqo98+oxgTYyqZ5Yfih94F8BXpybN6qj4B2SKf1Eb+zimlyB4aWdTqR4Z STELXSf3G7Mb544o9rH7MlBvRWlvnvbfCrnGYo4eYr+MhdD28GCPwzDzmRL8eGdLWv3yRf jDEwUUShjMRZx+L4o8MLCIA9tU1B5FE= Received: from now-macbook-pro.local (c-aa81e455.017-44-67626723.bbcust.telenor.se [85.228.129.170]) by disu.se (Postfix) with ESMTPSA id E9BE0BFDF for ; Mon, 22 Oct 2018 21:50:17 +0200 (CEST) From: Nikolai Weibull To: xml@gnome.org Date: Mon, 22 Oct 2018 21:50:14 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=disu.se; s=mail; t=1540237819; h=from:sender:reply-to:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:resent-to:resent-cc:resent-from:resent-sender:resent-message-id:in-reply-to:references:list-id:list-owner:list-unsubscribe:list-subscribe:list-post; bh=BeFGJrHPvFYNknbkGatRex731Y/R2m1wfy9czuM72i8=; b=DTmRI70TFDxzW2yfve6LEHXEcc0zB62GnmFwVjKDoajE8lFCV0XaUgr+jXX5sRH77vSXr+ 20lTJjgOVADgkWT728z3PzurUBsymsDrxVNbewloSdtjpvBs65ZP5KK6kXRlHI+V1QRS/4 br+yvkzzrIe+Vbhtw9lY8juhpVG0sc0= ARC-Seal: i=1; s=mail; d=disu.se; t=1540237819; a=rsa-sha256; cv=none; b=Uq08sYf7gBgQp6GnjRp2Kb7apXkygQ4jGyh6For2m8FCVnNxlcm4P/SpcHM+xd4D16z87XBzB8NB1thRFGcML/H49LNtNh9FBYS5oYEX5KegOt9DK9MMnx+WergVvmEKkvXZzCcudEID9WpU9g13C8+DfCrglIJ/2mPeKX1Xmwo= ARC-Authentication-Results: i=1; auth=pass smtp.auth=now smtp.mailfrom=now@disu.se Subject: [xml] XInclude and in-scope namespaces X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Oct 2018 19:50:22 -0000 Hi! I=E2=80=99m trying to do something like the following: a.xml: b.xml: and then % xmllint --xinclude a.xml This, however, doesn=E2=80=99t render the output I was expecting, namely but rather That is, the in-scope namespaces of the =E2=80=9Cb=E2=80=9D element aren=E2= =80=99t being=20 copied over. I can=E2=80=99t determine from the standard whether this is=20 as intended or not. Section 4.5.4 talks about =E2=80=9Cnamespace fixup=E2= =80=9D,=20 but I can=E2=80=99t for the life of me understand what they=E2=80=99re tryi= ng to=20 say, see https://www.w3.org/TR/xinclude/#namespaces I tried looking at other implementations, but Xerces doesn=E2=80=99t=20 support the xpointer attribute and I couldn=E2=80=99t find any other=20 XInclude implementations. Anyway, is this working as intended, or should in-scope namespaces=20 be included? Nikolai From flash@vicsmba.com Fri Oct 26 00:24:07 2018 Return-Path: X-Original-To: xml@gnome.org Delivered-To: xml@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.gnome.org (Postfix) with ESMTP id CBCD37611D for ; Fri, 26 Oct 2018 00:24:07 +0000 (UTC) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.6 X-Spam-Level: X-Spam-Status: No, score=-2.6 tagged_above=-999 required=2 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham Received: from smtp.gnome.org ([127.0.0.1]) by localhost (restaurant.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EiKF3TdhKd5e for ; Fri, 26 Oct 2018 00:24:06 +0000 (UTC) Received: from fwd1.spamarrest.com (fwd1.spamarrest.com [208.101.27.208]) by smtp.gnome.org (Postfix) with ESMTPS id 910107611E for ; Fri, 26 Oct 2018 00:24:06 +0000 (UTC) Received: from smtpb.spamarrest.com (smtp02.spamarrest.com [10.13.85.24]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by fwd1.spamarrest.com (Postfix) with ESMTP id 1673988D975 for ; Thu, 25 Oct 2018 19:24:06 -0500 (CDT) Received: from ese (unknown [198.102.68.202]) (Authenticated sender: flash) by smtpb.spamarrest.com (Postfix) with ESMTPA id D1405FBC15A; Thu, 25 Oct 2018 19:24:02 -0500 (CDT) From: "Eric Eberhard" To: "'Nikolai Weibull'" Cc: References: <12614104.296850.1540237832875.JavaMail.root@m05> In-Reply-To: <12614104.296850.1540237832875.JavaMail.root@m05> Date: Thu, 25 Oct 2018 17:23:39 -0700 Message-ID: <058801d46cc2$2fcdf8f0$8f69ead0$@vicsmba.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQDRrxxjQ0dZpSZVH0xxKoZg+HGVDqc2JODA Content-Language: en-us Subject: Re: [xml] XInclude and in-scope namespaces X-BeenThere: xml@gnome.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The Gnome XML library mailing-list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Oct 2018 00:24:08 -0000 I have found name spaces to be completely frustrating. Big companies = (like Fedex) don't handle them "properly" in that if a namespace applies = to the document they don't carry it through for each block. I finally = gave up on using libxml2 namespaces (which are correct -- if the rest of = the world would read the specs) -- I simply use the add attribute = feature to put my names spaces in, and use ns1:tag or ns2:tag as the = tags I set. It is a hassle at first, but once you start coding that = way, you can handle all the (incorrect and goofy) ways that people = handle XML and name spaces. Eric -----Original Message----- From: xml [mailto:xml-bounces@gnome.org] On Behalf Of Nikolai Weibull = via xml Sent: Monday, October 22, 2018 12:50 PM To: xml@gnome.org Subject: [xml] XInclude and in-scope namespaces Hi! I=E2=80=99m trying to do something like the following: a.xml: b.xml: and then % xmllint --xinclude a.xml This, however, doesn=E2=80=99t render the output I was expecting, namely but rather That is, the in-scope namespaces of the =E2=80=9Cb=E2=80=9D element = aren=E2=80=99t being copied over. I can=E2=80=99t determine from the = standard whether this is as intended or not. Section 4.5.4 talks about = =E2=80=9Cnamespace fixup=E2=80=9D, but I can=E2=80=99t for the life of = me understand what they=E2=80=99re trying to say, see https://www.w3.org/TR/xinclude/#namespaces I tried looking at other implementations, but Xerces doesn=E2=80=99t = support the xpointer attribute and I couldn=E2=80=99t find any other = XInclude implementations. Anyway, is this working as intended, or should in-scope namespaces be = included? Nikolai _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org = https://mail.gnome.org/mailman/listinfo/xml