From bradley.kite@gmail.com Sun Aug 16 20:59:55 2009 Return-Path: X-Original-To: gdome@gnome.org Delivered-To: gdome@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by menubar.gnome.org (Postfix) with ESMTP id 213277501FF for ; Sun, 16 Aug 2009 20:59:55 +0000 (GMT) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.599 X-Spam-Level: X-Spam-Status: No, score=-2.599 tagged_above=-999 required=2 tests=[BAYES_00=-2.599] X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 808 hrs), (distance 16, link: ethernet/modem), [209.85.219.209] Received: from menubar.gnome.org ([127.0.0.1]) by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pmjl83GhRj7A for ; Sun, 16 Aug 2009 20:59:49 +0000 (GMT) Received: from mail-ew0-f209.google.com (mail-ew0-f209.google.com [209.85.219.209]) by menubar.gnome.org (Postfix) with ESMTP id 25DCD7500DA for ; Sun, 16 Aug 2009 20:59:40 +0000 (GMT) Received: by ewy5 with SMTP id 5so598779ewy.15 for ; Sun, 16 Aug 2009 13:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=2vh30J5lPNuDS2DSgf/6GiZ0+eQ+CoDBiFS9URIj8bM=; b=EqagksgxUj0/seiqH6rwaVW+tITztEypr0nb1LAGzhtMZOJjhiWOheWPSrqOp+6tCK ZwKIQ1CzE6/7dd3PbOnIQ9tThpIfxKF/VXny3Kk2CIZC9GEorIsG/fjyvhoMcPNgAFe/ gZQNZUmmFI3pbZw8WtvCPGrP5Fd3S0b/fTYWY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=Kilmv8wIATs6X+4LFloJCFOfQ0vY623mZGE1C31++yCxAHtf4ecW8IpBLmD2hU/LT3 tR5GvvgC5SQ4zjJBZjVJUtFoKZnt475TaYtFxBCh9T7sLiz99TNBo9MtFxtuSPKvJsJm yKA8FbvNngd2kVM89tf7DDnC7iZ5DMJytsT4s= MIME-Version: 1.0 Received: by 10.216.89.135 with SMTP id c7mr932531wef.62.1250456378179; Sun, 16 Aug 2009 13:59:38 -0700 (PDT) Date: Sun, 16 Aug 2009 21:59:38 +0100 Message-ID: From: Bradley Kite To: gdome@gnome.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: [gdome] Parsing HTML documents? X-BeenThere: gdome@gnome.org X-Mailman-Version: 2.1.10 Precedence: list List-Id: Gnome DOM development list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Aug 2009 20:59:55 -0000 Hello. I have been looking for a C library that provides a DOM interface to parsed HTML documents, however I have been struggling to make it work the way that I'd like (probably because I'm trying to use it incorrectly, no doubt!). Firstly, can gdome be used to parse HTML documents? I am aware that its more geared towards XML, which, although similar, has obvious differences! In any case, I'm using one of the examples as a start, however I'm getting this error while calling parse(): parser error : StartTag: invalid element name I guess the main question I have, is am I using the right tool? Or should I be using something more suited to HTML? If so, would any body have any recomendations? I need to be able to modify various components of the DOM, and it needs to be written in C (or C++, but preferably C). Many thanks -- Bradley Kite From padovani.luca@gmail.com Mon Aug 17 05:35:19 2009 Return-Path: <padovani.luca@gmail.com> X-Original-To: gdome@gnome.org Delivered-To: gdome@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by menubar.gnome.org (Postfix) with ESMTP id 7B57F75008F for <gdome@gnome.org>; Mon, 17 Aug 2009 05:35:19 +0000 (GMT) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.599 X-Spam-Level: X-Spam-Status: No, score=-2.599 tagged_above=-999 required=2 tests=[BAYES_00=-2.599] X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 11684 hrs), (distance 15, link: ethernet/modem), [209.85.218.225] Received: from menubar.gnome.org ([127.0.0.1]) by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M0zdDlcDrHcv for <gdome@gnome.org>; Mon, 17 Aug 2009 05:35:11 +0000 (GMT) Received: from mail-bw0-f225.google.com (mail-bw0-f225.google.com [209.85.218.225]) by menubar.gnome.org (Postfix) with ESMTP id 3DE18750088 for <gdome@gnome.org>; Mon, 17 Aug 2009 05:35:02 +0000 (GMT) Received: by bwz25 with SMTP id 25so2263836bwz.35 for <gdome@gnome.org>; Sun, 16 Aug 2009 22:35:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=phrEev3TglQircc37QXexviz3kKVg/uzRMG+Uw5+Bjk=; b=bW139JP52R4EZR34NA5q7rdtHuMnFQuyfS8JxxFJuHKKIuFqbDyhzFHiVQ24z0tUwy sVfTWXu0qn/e7Bwovbx7BLuUtE3LQXfrjM5bp65FQfol2NSzUgrIChn1Q/kUQOqH9Blr pRYxqspEi1rbdTVcLpb7oDMuBs7p5UOBG7dak= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=fLPvcQZmvNqyDVWNXxFaEKsPx7+AoVqfZjwc8YAQvLih/pSk0T9bnUr1CQoe+gWdRy 3c/2jqLgolhbgHcoIImd6u3pNuCenOnmTc437QjAaZEtZcGbo2Uf34RvBbfl1LHDRo7g iK67S6KY8iGS7EydXsh56WW2a4sAQUKiTbRIs= MIME-Version: 1.0 Sender: padovani.luca@gmail.com Received: by 10.223.120.129 with SMTP id d1mr828032far.26.1250487300082; Sun, 16 Aug 2009 22:35:00 -0700 (PDT) In-Reply-To: <e97f32c10908161359u137b9a04kaaabfc6857c1af16@mail.gmail.com> References: <e97f32c10908161359u137b9a04kaaabfc6857c1af16@mail.gmail.com> Date: Mon, 17 Aug 2009 07:35:00 +0200 X-Google-Sender-Auth: a618adae2e77b545 Message-ID: <cd5d3a640908162235m5de0cb70ia6070af361fd6a7a@mail.gmail.com> From: Luca Padovani <lpadovan@cs.unibo.it> To: Bradley Kite <bradley.kite@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: gdome@gnome.org Subject: Re: [gdome] Parsing HTML documents? X-BeenThere: gdome@gnome.org X-Mailman-Version: 2.1.10 Precedence: list List-Id: Gnome DOM development list <gdome.gnome.org> List-Unsubscribe: <http://mail.gnome.org/mailman/options/gdome>, <mailto:gdome-request@gnome.org?subject=unsubscribe> List-Archive: <http://mail.gnome.org/archives/gdome> List-Post: <mailto:gdome@gnome.org> List-Help: <mailto:gdome-request@gnome.org?subject=help> List-Subscribe: <http://mail.gnome.org/mailman/listinfo/gdome>, <mailto:gdome-request@gnome.org?subject=subscribe> X-List-Received-Date: Mon, 17 Aug 2009 05:35:19 -0000 Hello Bradley, On Sun, Aug 16, 2009 at 10:59 PM, Bradley Kite<bradley.kite@gmail.com> wrote: > I guess the main question I have, is am I using the right tool? Or > should I be using something more suited to HTML? If so, would any body > have any recomendations? I need to be able to modify various > components of the DOM, and it needs to be written in C (or C++, but > preferably C). I don't remember what's the current status of HTML support in Gdome2. Anyway, I'd recommend to have a look at libxml2: http://xmlsoft.org/ it is the engine that underlies Gdome2 (which is just a wrapper on top of it). For sure it does support HTML parsing. Its API is not exactly the DOM one (which is why Gdome2 exists) but it gets very close. And it's written in plain C. Best regards, --luca From paolo@casarini.org Mon Aug 17 13:23:42 2009 Return-Path: <paolo@casarini.org> X-Original-To: gdome@gnome.org Delivered-To: gdome@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by menubar.gnome.org (Postfix) with ESMTP id 98D647501C4 for <gdome@gnome.org>; Mon, 17 Aug 2009 13:23:42 +0000 (GMT) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.444 X-Spam-Level: X-Spam-Status: No, score=-2.444 tagged_above=-999 required=2 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, TW_BX=0.077, TW_IB=0.077] X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 1082 hrs), (distance 16, link: ethernet/modem), [209.85.219.212] Received: from menubar.gnome.org ([127.0.0.1]) by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XIeOPNF59US0 for <gdome@gnome.org>; Mon, 17 Aug 2009 13:23:35 +0000 (GMT) Received: from mail-ew0-f212.google.com (mail-ew0-f212.google.com [209.85.219.212]) by menubar.gnome.org (Postfix) with ESMTP id 62EDE750192 for <gdome@gnome.org>; Mon, 17 Aug 2009 13:23:25 +0000 (GMT) Received: by ewy8 with SMTP id 8so240755ewy.15 for <gdome@gnome.org>; Mon, 17 Aug 2009 06:23:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.53.197 with SMTP id g47mr993948wec.91.1250515403221; Mon, 17 Aug 2009 06:23:23 -0700 (PDT) In-Reply-To: <e97f32c10908161359u137b9a04kaaabfc6857c1af16@mail.gmail.com> References: <e97f32c10908161359u137b9a04kaaabfc6857c1af16@mail.gmail.com> Date: Mon, 17 Aug 2009 15:23:23 +0200 Message-ID: <dc7a47930908170623l3deb6d54ib11e89f1068e225@mail.gmail.com> From: Paolo Casarini <paolo@casarini.org> To: Bradley Kite <bradley.kite@gmail.com> Content-Type: multipart/alternative; boundary=0016e6dee7676529f30471564e77 Cc: gdome@gnome.org Subject: Re: [gdome] Parsing HTML documents? X-BeenThere: gdome@gnome.org X-Mailman-Version: 2.1.10 Precedence: list List-Id: Gnome DOM development list <gdome.gnome.org> List-Unsubscribe: <http://mail.gnome.org/mailman/options/gdome>, <mailto:gdome-request@gnome.org?subject=unsubscribe> List-Archive: <http://mail.gnome.org/archives/gdome> List-Post: <mailto:gdome@gnome.org> List-Help: <mailto:gdome-request@gnome.org?subject=help> List-Subscribe: <http://mail.gnome.org/mailman/listinfo/gdome>, <mailto:gdome-request@gnome.org?subject=subscribe> X-List-Received-Date: Mon, 17 Aug 2009 13:23:42 -0000 --0016e6dee7676529f30471564e77 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello Bradley, gdome2 works only with well formed XML documents, the only released modules (wrt DOM specifications) are Main, Events and XPath: the HTML module is an old, not working and not maintained implementation. If you deal only with well formed XHTML documents, you can use gdome2, otherwise, as Luca already said, libxml2 is the right choice for you. Best regards, Paolo. On Aug 16, 2009 10:59 PM, "Bradley Kite" <bradley.kite@gmail.com> wrote: Hello. I have been looking for a C library that provides a DOM interface to parsed HTML documents, however I have been struggling to make it work the way that I'd like (probably because I'm trying to use it incorrectly, no doubt!). Firstly, can gdome be used to parse HTML documents? I am aware that its more geared towards XML, which, although similar, has obvious differences! In any case, I'm using one of the examples as a start, however I'm getting this error while calling parse(): parser error : StartTag: invalid element name <!doctype html><head><title> I guess the main question I have, is am I using the right tool? Or should I be using something more suited to HTML? If so, would any body have any recomendations? I need to be able to modify various components of the DOM, and it needs to be written in C (or C++, but preferably C). Many thanks -- Bradley Kite _______________________________________________ gdome mailing list gdome@gnome.org http://mail.gnome.org/mailman/listinfo/gdome --0016e6dee7676529f30471564e77 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <p>Hello Bradley,</p> <p>=A0 gdome2 works only with well formed XML documents, the only released = modules (wrt DOM specifications) are Main, Events and XPath: the HTML modul= e is an old, not working and not maintained implementation.</p> <p>If you deal only with well formed XHTML documents, you can use gdome2, o= therwise, as Luca already said, libxml2 is the right choice for you.</p> <p>Best regards,<br> =A0 Paolo.<br> </p> <p><blockquote type=3D"cite">On Aug 16, 2009 10:59 PM, "Bradley Kite&q= uot; <<a href=3D"mailto:bradley.kite@gmail.com">bradley.kite@gmail.com</= a>> wrote:<br><br>Hello.<br> <br> I have been looking for a C library that provides a DOM interface to<br> parsed HTML documents, however I have been struggling to make it work<br> the way that I'd like (probably because I'm trying to use it<br> incorrectly, no doubt!).<br> <br> Firstly, can gdome be used to parse HTML documents? I am aware that<br> its more geared towards XML, which, although similar, has obvious<br> differences!<br> <br> In any case, I'm using one of the examples as a start, however I'm<= br> getting this error while calling parse():<br> <br> parser error : StartTag: invalid element name<br> <!doctype html><head><title><br> <br> I guess the main question I have, is am I using the right tool? Or<br> should I be using something more suited to HTML? If so, would any body<br> have any recomendations? I need to be able to modify various<br> components of the DOM, and it needs to be written in C (or C++, but<br> preferably C).<br> <br> Many thanks<br> <font color=3D"#888888">--<br> Bradley Kite<br> _______________________________________________<br> gdome mailing list<br> <a href=3D"mailto:gdome@gnome.org">gdome@gnome.org</a><br> <a href=3D"http://mail.gnome.org/mailman/listinfo/gdome" target=3D"_blank">= http://mail.gnome.org/mailman/listinfo/gdome</a><br> </font></blockquote></p> --0016e6dee7676529f30471564e77-- From bradley.kite@gmail.com Thu Aug 20 09:12:18 2009 Return-Path: <bradley.kite@gmail.com> X-Original-To: gdome@gnome.org Delivered-To: gdome@gnome.org Received: from localhost (localhost.localdomain [127.0.0.1]) by menubar.gnome.org (Postfix) with ESMTP id 980B4750068 for <gdome@gnome.org>; Thu, 20 Aug 2009 09:12:18 +0000 (GMT) X-Virus-Scanned: by amavisd-new at gnome.org X-Spam-Flag: NO X-Spam-Score: -2.445 X-Spam-Level: X-Spam-Status: No, score=-2.445 tagged_above=-999 required=2 tests=[BAYES_00=-2.599, TW_BX=0.077, TW_IB=0.077] X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 9861 hrs), (distance 16, link: ethernet/modem), [209.85.219.210] Received: from menubar.gnome.org ([127.0.0.1]) by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RdqcKZ3qSa57 for <gdome@gnome.org>; Thu, 20 Aug 2009 09:12:10 +0000 (GMT) Received: from mail-ew0-f210.google.com (mail-ew0-f210.google.com [209.85.219.210]) by menubar.gnome.org (Postfix) with ESMTP id E8B867500C5 for <gdome@gnome.org>; Thu, 20 Aug 2009 09:12:01 +0000 (GMT) Received: by ewy6 with SMTP id 6so1704551ewy.34 for <gdome@gnome.org>; Thu, 20 Aug 2009 02:11:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=c9y4j2RS6bm2U5NPg8NSWXGlCu0sXUgLdg2x161iRCk=; b=NYXKLcx8CIlugZG/bx09x7EDks3mJAcTd7M88Esm5wZXSdEQhx2Fi+491IpxjZ5FRn OgR4z1fIh6W2u4p2Femti3wGT28woYQYEF84r/I/pRC4LRUs/pG+T14snT+IC5JsP+zZ Zu7+HJ5adFI/zk6DEQCRwHZvAPnBQHt7VzhuE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=PIFHeA60Fx8Q7SO52fBTmdHzJ5rydxPrsZ0WnzKysX6mRNUonZSP4nOXIEeoNYqr+h c4HsUkhtLWGrjHgNaSvl7bqkdbjJh32oyPookplkBsxFzRlm+G1r3iSyycDqB+sd9992 Jo6RUeP8c3kBrB3a19f6yGJOGSv5yquOKCp/M= MIME-Version: 1.0 Received: by 10.216.8.209 with SMTP id 59mr1772050wer.18.1250759518780; Thu, 20 Aug 2009 02:11:58 -0700 (PDT) In-Reply-To: <dc7a47930908170623l3deb6d54ib11e89f1068e225@mail.gmail.com> References: <e97f32c10908161359u137b9a04kaaabfc6857c1af16@mail.gmail.com> <dc7a47930908170623l3deb6d54ib11e89f1068e225@mail.gmail.com> Date: Thu, 20 Aug 2009 10:11:58 +0100 Message-ID: <e97f32c10908200211m3716d03bp2caa27d98396d7df@mail.gmail.com> From: Bradley Kite <bradley.kite@gmail.com> To: gdome@gnome.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [gdome] Parsing HTML documents? X-BeenThere: gdome@gnome.org X-Mailman-Version: 2.1.10 Precedence: list List-Id: Gnome DOM development list <gdome.gnome.org> List-Unsubscribe: <http://mail.gnome.org/mailman/options/gdome>, <mailto:gdome-request@gnome.org?subject=unsubscribe> List-Archive: <http://mail.gnome.org/archives/gdome> List-Post: <mailto:gdome@gnome.org> List-Help: <mailto:gdome-request@gnome.org?subject=help> List-Subscribe: <http://mail.gnome.org/mailman/listinfo/gdome>, <mailto:gdome-request@gnome.org?subject=subscribe> X-List-Received-Date: Thu, 20 Aug 2009 09:12:18 -0000 2009/8/17 Paolo Casarini <paolo@casarini.org>: > Hello Bradley, > > =A0 gdome2 works only with well formed XML documents, the only released > modules (wrt DOM specifications) are Main, Events and XPath: the HTML mod= ule > is an old, not working and not maintained implementation. > > If you deal only with well formed XHTML documents, you can use gdome2, > otherwise, as Luca already said, libxml2 is the right choice for you. > > Best regards, > =A0 Paolo. > > On Aug 16, 2009 10:59 PM, "Bradley Kite" <bradley.kite@gmail.com> wrote: > > Hello. > > I have been looking for a C library that provides a DOM interface to > parsed HTML documents, however I have been struggling to make it work > the way that I'd like (probably because I'm trying to use it > incorrectly, no doubt!). > > Firstly, can gdome be used to parse HTML documents? I am aware that > its more geared towards XML, which, although similar, has obvious > differences! > > In any case, I'm using one of the examples as a start, however I'm > getting this error while calling parse(): > > parser error : StartTag: invalid element name > <!doctype html><head><title> > > I guess the main question I have, is am I using the right tool? Or > should I be using something more suited to HTML? If so, would any body > have any recomendations? I need to be able to modify various > components of the DOM, and it needs to be written in C (or C++, but > preferably C). > > Many thanks > -- > Bradley Kite > _______________________________________________ > gdome mailing list > gdome@gnome.org > http://mail.gnome.org/mailman/listinfo/gdome > Hi Luca and Paolo Many thanks for your responses. I have started using libxml2 and found it to be exactly what I require! Kind Regards -- Brad.