From bradley.kite@gmail.com Sun Aug 16 20:59:55 2009
Return-Path:
X-Original-To: gdome@gnome.org
Delivered-To: gdome@gnome.org
Received: from localhost (localhost.localdomain [127.0.0.1])
by menubar.gnome.org (Postfix) with ESMTP id 213277501FF
for ; Sun, 16 Aug 2009 20:59:55 +0000 (GMT)
X-Virus-Scanned: by amavisd-new at gnome.org
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=2
tests=[BAYES_00=-2.599]
X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 808 hrs), (distance 16,
link: ethernet/modem), [209.85.219.209]
Received: from menubar.gnome.org ([127.0.0.1])
by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id pmjl83GhRj7A for ;
Sun, 16 Aug 2009 20:59:49 +0000 (GMT)
Received: from mail-ew0-f209.google.com (mail-ew0-f209.google.com
[209.85.219.209])
by menubar.gnome.org (Postfix) with ESMTP id 25DCD7500DA
for ; Sun, 16 Aug 2009 20:59:40 +0000 (GMT)
Received: by ewy5 with SMTP id 5so598779ewy.15
for ; Sun, 16 Aug 2009 13:59:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:received:date:message-id:subject
:from:to:content-type:content-transfer-encoding;
bh=2vh30J5lPNuDS2DSgf/6GiZ0+eQ+CoDBiFS9URIj8bM=;
b=EqagksgxUj0/seiqH6rwaVW+tITztEypr0nb1LAGzhtMZOJjhiWOheWPSrqOp+6tCK
ZwKIQ1CzE6/7dd3PbOnIQ9tThpIfxKF/VXny3Kk2CIZC9GEorIsG/fjyvhoMcPNgAFe/
gZQNZUmmFI3pbZw8WtvCPGrP5Fd3S0b/fTYWY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
h=mime-version:date:message-id:subject:from:to:content-type
:content-transfer-encoding;
b=Kilmv8wIATs6X+4LFloJCFOfQ0vY623mZGE1C31++yCxAHtf4ecW8IpBLmD2hU/LT3
tR5GvvgC5SQ4zjJBZjVJUtFoKZnt475TaYtFxBCh9T7sLiz99TNBo9MtFxtuSPKvJsJm
yKA8FbvNngd2kVM89tf7DDnC7iZ5DMJytsT4s=
MIME-Version: 1.0
Received: by 10.216.89.135 with SMTP id c7mr932531wef.62.1250456378179; Sun,
16 Aug 2009 13:59:38 -0700 (PDT)
Date: Sun, 16 Aug 2009 21:59:38 +0100
Message-ID:
From: Bradley Kite
To: gdome@gnome.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: [gdome] Parsing HTML documents?
X-BeenThere: gdome@gnome.org
X-Mailman-Version: 2.1.10
Precedence: list
List-Id: Gnome DOM development list
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Sun, 16 Aug 2009 20:59:55 -0000
Hello.
I have been looking for a C library that provides a DOM interface to
parsed HTML documents, however I have been struggling to make it work
the way that I'd like (probably because I'm trying to use it
incorrectly, no doubt!).
Firstly, can gdome be used to parse HTML documents? I am aware that
its more geared towards XML, which, although similar, has obvious
differences!
In any case, I'm using one of the examples as a start, however I'm
getting this error while calling parse():
parser error : StartTag: invalid element name
I guess the main question I have, is am I using the right tool? Or
should I be using something more suited to HTML? If so, would any body
have any recomendations? I need to be able to modify various
components of the DOM, and it needs to be written in C (or C++, but
preferably C).
Many thanks
--
Bradley Kite
From padovani.luca@gmail.com Mon Aug 17 05:35:19 2009
Return-Path:
X-Original-To: gdome@gnome.org
Delivered-To: gdome@gnome.org
Received: from localhost (localhost.localdomain [127.0.0.1])
by menubar.gnome.org (Postfix) with ESMTP id 7B57F75008F
for ; Mon, 17 Aug 2009 05:35:19 +0000 (GMT)
X-Virus-Scanned: by amavisd-new at gnome.org
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=2
tests=[BAYES_00=-2.599]
X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 11684 hrs), (distance 15,
link: ethernet/modem), [209.85.218.225]
Received: from menubar.gnome.org ([127.0.0.1])
by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id M0zdDlcDrHcv for ;
Mon, 17 Aug 2009 05:35:11 +0000 (GMT)
Received: from mail-bw0-f225.google.com (mail-bw0-f225.google.com
[209.85.218.225])
by menubar.gnome.org (Postfix) with ESMTP id 3DE18750088
for ; Mon, 17 Aug 2009 05:35:02 +0000 (GMT)
Received: by bwz25 with SMTP id 25so2263836bwz.35
for ; Sun, 16 Aug 2009 22:35:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:sender:received:in-reply-to
:references:date:x-google-sender-auth:message-id:subject:from:to:cc
:content-type:content-transfer-encoding;
bh=phrEev3TglQircc37QXexviz3kKVg/uzRMG+Uw5+Bjk=;
b=bW139JP52R4EZR34NA5q7rdtHuMnFQuyfS8JxxFJuHKKIuFqbDyhzFHiVQ24z0tUwy
sVfTWXu0qn/e7Bwovbx7BLuUtE3LQXfrjM5bp65FQfol2NSzUgrIChn1Q/kUQOqH9Blr
pRYxqspEi1rbdTVcLpb7oDMuBs7p5UOBG7dak=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
h=mime-version:sender:in-reply-to:references:date
:x-google-sender-auth:message-id:subject:from:to:cc:content-type
:content-transfer-encoding;
b=fLPvcQZmvNqyDVWNXxFaEKsPx7+AoVqfZjwc8YAQvLih/pSk0T9bnUr1CQoe+gWdRy
3c/2jqLgolhbgHcoIImd6u3pNuCenOnmTc437QjAaZEtZcGbo2Uf34RvBbfl1LHDRo7g
iK67S6KY8iGS7EydXsh56WW2a4sAQUKiTbRIs=
MIME-Version: 1.0
Sender: padovani.luca@gmail.com
Received: by 10.223.120.129 with SMTP id d1mr828032far.26.1250487300082; Sun,
16 Aug 2009 22:35:00 -0700 (PDT)
In-Reply-To:
References:
Date: Mon, 17 Aug 2009 07:35:00 +0200
X-Google-Sender-Auth: a618adae2e77b545
Message-ID:
From: Luca Padovani
To: Bradley Kite
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: gdome@gnome.org
Subject: Re: [gdome] Parsing HTML documents?
X-BeenThere: gdome@gnome.org
X-Mailman-Version: 2.1.10
Precedence: list
List-Id: Gnome DOM development list
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Mon, 17 Aug 2009 05:35:19 -0000
Hello Bradley,
On Sun, Aug 16, 2009 at 10:59 PM, Bradley Kite wrote:
> I guess the main question I have, is am I using the right tool? Or
> should I be using something more suited to HTML? If so, would any body
> have any recomendations? I need to be able to modify various
> components of the DOM, and it needs to be written in C (or C++, but
> preferably C).
I don't remember what's the current status of HTML support in Gdome2.
Anyway, I'd recommend to have a look at libxml2:
http://xmlsoft.org/
it is the engine that underlies Gdome2 (which is just a wrapper on top
of it). For sure it does support HTML parsing. Its API is not exactly
the DOM one (which is why Gdome2 exists) but it gets very close. And
it's written in plain C.
Best regards,
--luca
From paolo@casarini.org Mon Aug 17 13:23:42 2009
Return-Path:
X-Original-To: gdome@gnome.org
Delivered-To: gdome@gnome.org
Received: from localhost (localhost.localdomain [127.0.0.1])
by menubar.gnome.org (Postfix) with ESMTP id 98D647501C4
for ; Mon, 17 Aug 2009 13:23:42 +0000 (GMT)
X-Virus-Scanned: by amavisd-new at gnome.org
X-Spam-Flag: NO
X-Spam-Score: -2.444
X-Spam-Level:
X-Spam-Status: No, score=-2.444 tagged_above=-999 required=2
tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, TW_BX=0.077, TW_IB=0.077]
X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 1082 hrs), (distance 16,
link: ethernet/modem), [209.85.219.212]
Received: from menubar.gnome.org ([127.0.0.1])
by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id XIeOPNF59US0 for ;
Mon, 17 Aug 2009 13:23:35 +0000 (GMT)
Received: from mail-ew0-f212.google.com (mail-ew0-f212.google.com
[209.85.219.212])
by menubar.gnome.org (Postfix) with ESMTP id 62EDE750192
for ; Mon, 17 Aug 2009 13:23:25 +0000 (GMT)
Received: by ewy8 with SMTP id 8so240755ewy.15
for ; Mon, 17 Aug 2009 06:23:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.216.53.197 with SMTP id g47mr993948wec.91.1250515403221; Mon,
17 Aug 2009 06:23:23 -0700 (PDT)
In-Reply-To:
References:
Date: Mon, 17 Aug 2009 15:23:23 +0200
Message-ID:
From: Paolo Casarini
To: Bradley Kite
Content-Type: multipart/alternative; boundary=0016e6dee7676529f30471564e77
Cc: gdome@gnome.org
Subject: Re: [gdome] Parsing HTML documents?
X-BeenThere: gdome@gnome.org
X-Mailman-Version: 2.1.10
Precedence: list
List-Id: Gnome DOM development list
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Mon, 17 Aug 2009 13:23:42 -0000
--0016e6dee7676529f30471564e77
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Hello Bradley,
gdome2 works only with well formed XML documents, the only released
modules (wrt DOM specifications) are Main, Events and XPath: the HTML module
is an old, not working and not maintained implementation.
If you deal only with well formed XHTML documents, you can use gdome2,
otherwise, as Luca already said, libxml2 is the right choice for you.
Best regards,
Paolo.
On Aug 16, 2009 10:59 PM, "Bradley Kite" wrote:
Hello.
I have been looking for a C library that provides a DOM interface to
parsed HTML documents, however I have been struggling to make it work
the way that I'd like (probably because I'm trying to use it
incorrectly, no doubt!).
Firstly, can gdome be used to parse HTML documents? I am aware that
its more geared towards XML, which, although similar, has obvious
differences!
In any case, I'm using one of the examples as a start, however I'm
getting this error while calling parse():
parser error : StartTag: invalid element name
I guess the main question I have, is am I using the right tool? Or
should I be using something more suited to HTML? If so, would any body
have any recomendations? I need to be able to modify various
components of the DOM, and it needs to be written in C (or C++, but
preferably C).
Many thanks
--
Bradley Kite
_______________________________________________
gdome mailing list
gdome@gnome.org
http://mail.gnome.org/mailman/listinfo/gdome
--0016e6dee7676529f30471564e77
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hello Bradley,
=A0 gdome2 works only with well formed XML documents, the only released =
modules (wrt DOM specifications) are Main, Events and XPath: the HTML modul=
e is an old, not working and not maintained implementation.
If you deal only with well formed XHTML documents, you can use gdome2, o=
therwise, as Luca already said, libxml2 is the right choice for you.
Best regards,
=A0 Paolo.
On Aug 16, 2009 10:59 PM, "Bradley Kite&q=
uot; <bradley.kite@gmail.com=
a>> wrote:
Hello.
I have been looking for a C library that provides a DOM interface to
parsed HTML documents, however I have been struggling to make it work
the way that I'd like (probably because I'm trying to use it
incorrectly, no doubt!).
Firstly, can gdome be used to parse HTML documents? I am aware that
its more geared towards XML, which, although similar, has obvious
differences!
In any case, I'm using one of the examples as a start, however I'm<=
br>
getting this error while calling parse():
parser error : StartTag: invalid element name
<!doctype html><head><title>
I guess the main question I have, is am I using the right tool? Or
should I be using something more suited to HTML? If so, would any body
have any recomendations? I need to be able to modify various
components of the DOM, and it needs to be written in C (or C++, but
preferably C).
Many thanks
--
Bradley Kite
_______________________________________________
gdome mailing list
gdome@gnome.org
=
http://mail.gnome.org/mailman/listinfo/gdome
--0016e6dee7676529f30471564e77--
From bradley.kite@gmail.com Thu Aug 20 09:12:18 2009
Return-Path:
X-Original-To: gdome@gnome.org
Delivered-To: gdome@gnome.org
Received: from localhost (localhost.localdomain [127.0.0.1])
by menubar.gnome.org (Postfix) with ESMTP id 980B4750068
for ; Thu, 20 Aug 2009 09:12:18 +0000 (GMT)
X-Virus-Scanned: by amavisd-new at gnome.org
X-Spam-Flag: NO
X-Spam-Score: -2.445
X-Spam-Level:
X-Spam-Status: No, score=-2.445 tagged_above=-999 required=2
tests=[BAYES_00=-2.599, TW_BX=0.077, TW_IB=0.077]
X-Amavis-OS-Fingerprint: Linux 2.6 (newer, 2) (up: 9861 hrs), (distance 16,
link: ethernet/modem), [209.85.219.210]
Received: from menubar.gnome.org ([127.0.0.1])
by localhost (menubar.gnome.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id RdqcKZ3qSa57 for ;
Thu, 20 Aug 2009 09:12:10 +0000 (GMT)
Received: from mail-ew0-f210.google.com (mail-ew0-f210.google.com
[209.85.219.210])
by menubar.gnome.org (Postfix) with ESMTP id E8B867500C5
for ; Thu, 20 Aug 2009 09:12:01 +0000 (GMT)
Received: by ewy6 with SMTP id 6so1704551ewy.34
for ; Thu, 20 Aug 2009 02:11:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:received:in-reply-to:references
:date:message-id:subject:from:to:content-type
:content-transfer-encoding;
bh=c9y4j2RS6bm2U5NPg8NSWXGlCu0sXUgLdg2x161iRCk=;
b=NYXKLcx8CIlugZG/bx09x7EDks3mJAcTd7M88Esm5wZXSdEQhx2Fi+491IpxjZ5FRn
OgR4z1fIh6W2u4p2Femti3wGT28woYQYEF84r/I/pRC4LRUs/pG+T14snT+IC5JsP+zZ
Zu7+HJ5adFI/zk6DEQCRwHZvAPnBQHt7VzhuE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type:content-transfer-encoding;
b=PIFHeA60Fx8Q7SO52fBTmdHzJ5rydxPrsZ0WnzKysX6mRNUonZSP4nOXIEeoNYqr+h
c4HsUkhtLWGrjHgNaSvl7bqkdbjJh32oyPookplkBsxFzRlm+G1r3iSyycDqB+sd9992
Jo6RUeP8c3kBrB3a19f6yGJOGSv5yquOKCp/M=
MIME-Version: 1.0
Received: by 10.216.8.209 with SMTP id 59mr1772050wer.18.1250759518780; Thu,
20 Aug 2009 02:11:58 -0700 (PDT)
In-Reply-To:
References:
Date: Thu, 20 Aug 2009 10:11:58 +0100
Message-ID:
From: Bradley Kite
To: gdome@gnome.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: [gdome] Parsing HTML documents?
X-BeenThere: gdome@gnome.org
X-Mailman-Version: 2.1.10
Precedence: list
List-Id: Gnome DOM development list
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Thu, 20 Aug 2009 09:12:18 -0000
2009/8/17 Paolo Casarini :
> Hello Bradley,
>
> =A0 gdome2 works only with well formed XML documents, the only released
> modules (wrt DOM specifications) are Main, Events and XPath: the HTML mod=
ule
> is an old, not working and not maintained implementation.
>
> If you deal only with well formed XHTML documents, you can use gdome2,
> otherwise, as Luca already said, libxml2 is the right choice for you.
>
> Best regards,
> =A0 Paolo.
>
> On Aug 16, 2009 10:59 PM, "Bradley Kite" wrote:
>
> Hello.
>
> I have been looking for a C library that provides a DOM interface to
> parsed HTML documents, however I have been struggling to make it work
> the way that I'd like (probably because I'm trying to use it
> incorrectly, no doubt!).
>
> Firstly, can gdome be used to parse HTML documents? I am aware that
> its more geared towards XML, which, although similar, has obvious
> differences!
>
> In any case, I'm using one of the examples as a start, however I'm
> getting this error while calling parse():
>
> parser error : StartTag: invalid element name
>
>
> I guess the main question I have, is am I using the right tool? Or
> should I be using something more suited to HTML? If so, would any body
> have any recomendations? I need to be able to modify various
> components of the DOM, and it needs to be written in C (or C++, but
> preferably C).
>
> Many thanks
> --
> Bradley Kite
> _______________________________________________
> gdome mailing list
> gdome@gnome.org
> http://mail.gnome.org/mailman/listinfo/gdome
>
Hi Luca and Paolo
Many thanks for your responses. I have started using libxml2 and found
it to be exactly what I require!
Kind Regards
--
Brad.