Message from discussion
Fast XML parser?
Received: by 10.216.198.67 with SMTP id u45mr942465wen.12.1351482357362;
Sun, 28 Oct 2012 20:45:57 -0700 (PDT)
Newsgroups: perl.beginners
Path: ha8ni72473wib.1!nntp.google.com!feeder2.cambriumusenet.nl!94.232.116.11.MISMATCH!feed.xsnews.nl!border-1.ams.xsnews.nl!xlned.com!feeder3.xlned.com!feeder3.cambriumusenet.nl!feed.tweaknews.nl!216.196.110.146.MISMATCH!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!news.nobody.at!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!nntp.develooper.com!nntp.perl.org
Return-Path: <orasn...@gmail.com>
Mailing-List: contact beginners-h...@perl.org; run by ezmlm
Delivered-To: mailing list beginn...@perl.org
Received: (qmail 20872 invoked from network); 25 Oct 2012 11:57:48 -0000
Received: from x1.develooper.com (207.171.7.70)
by x6.develooper.com with SMTP; 25 Oct 2012 11:57:48 -0000
Received: (qmail 32406 invoked by uid 225); 25 Oct 2012 11:57:48 -0000
Delivered-To: beginn...@perl.org
Received: (qmail 32402 invoked by alias); 25 Oct 2012 11:57:48 -0000
X-Spam-Status: No, hits=2.7 required=8.0
tests=BAYES_05,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_CBL,RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: la.mx.develooper.com
Received: from mail-bk0-f41.google.com (HELO mail-bk0-f41.google.com) (209.85.214.41)
by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 25 Oct 2012 04:57:42 -0700
Received: by mail-bk0-f41.google.com with SMTP id jm1so721204bkc.14
for <beginn...@perl.org>; Thu, 25 Oct 2012 04:57:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=message-id:from:to:subject:date:mime-version:content-type
:content-transfer-encoding:x-priority:x-msmail-priority:x-mailer
:x-mimeole;
bh=HFeg2o7PgCPFyuH4VcFZdprDhtEnrGZYGul87xyLBgQ=;
b=eDMKc9owB3iM1YD6oPAIcor4fGi0CCF3T38PhRRkXamrlnxeDWJBk/qoVd40To9XgD
xFPFQ6XdktYkxZsO2FnCI8ONqATufvJm4qvuIAXTF5LyfNK02T70nm4cKy2vnHksM3Z1
r736NxD7/UuSq5y33/JKNkKNoZgjUx5/+9YghFBy8LAcug13AVLHw/9lTbm7BXoX7oJa
jrV2mUPIHaocuAVlqCLcSWNJ1BBNUlWBInysPs7LQ0lkPoW5k+3zxkF98/bt0ckbSTiI
n2865pYQeI+knJ4xbnDC/eOXJrD9PBCvwyvftbZQCPyKy6HKklPuCaZL/DYJDyKKGg+k
ha7A==
Received: by 10.204.10.74 with SMTP id o10mr6050825bko.9.1351166257472;
Thu, 25 Oct 2012 04:57:37 -0700 (PDT)
Received: from octavian ([93.115.153.120])
by mx.google.com with ESMTPS id 1sm9954002bks.3.2012.10.25.04.57.35
(version=SSLv3 cipher=OTHER);
Thu, 25 Oct 2012 04:57:36 -0700 (PDT)
Message-ID: <B9172527B9A149E4AEB9A07E88066562@octavian>
To: <beginn...@perl.org>
Subject: Fast XML parser?
Date: Thu, 25 Oct 2012 14:33:15 +0300
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Approved: n...@nntp.perl.org
From: orasn...@gmail.com ("Octavian Rasnita")
Bytes: 4345
Lines: 51
Content-Type: text/plain;
charset="iso-8859-2"
Content-Transfer-Encoding: quoted-printable
Hi,
Can you recommend an XML parser which is faster than XML::Twig?
I need to use an XML parser that can parse the XML files chunk by chunk =
and which works faster (much faster) than XML::Twig, because I tried =
using this module but it is very slow.
I tried something like the code below, but I have also tried a version =
that just opens the file and parses it using regular expressions, =
however the unelegant regexp version is 25 times faster than the one =
which uses XML::Twig, and it also uses less memory.
If you think there is a module for parsing XML which would work faster =
than regular expressions, or if I can substantially improve the program =
which uses XML::Twig then please tell me about it. If regexp will still =
be faster, I will use regexp.
Thanks.
use XML::Twig;
my $xml =3D 'path/to/xml/file.xml';
my $t=3D XML::Twig->new( twig_handlers =3D> {
Lexem =3D> sub {
my( $t, $lexem )=3D @_;
my $id =3D $lexem->att( 'id' );
my $timestamp =3D $lexem->first_child( 'Timestamp')->text;
my $lexem_text =3D $lexem->first_child( 'Form' )->text;
my @inflected_form =3D $lexem->children( 'InflectedForm' );
for my $inflected_form ( @inflected_form ) {
my $inflection_id =3D $inflected_form->first_child( =
'InflectionId' )->text;
my $inflection_text =3D $inflected_form->first_child( 'Form' =
)->text;
}
$t->purge;
return 1;
},
} );
$t->safe_parsefile( $xml );
$t->purge;
--Octavian