Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Fast XML parser?

Received: by 10.216.198.67 with SMTP id u45mr942465wen.12.1351482357362;
        Sun, 28 Oct 2012 20:45:57 -0700 (PDT)
Newsgroups: perl.beginners
Path: ha8ni72473wib.1!nntp.google.com!feeder2.cambriumusenet.nl!94.232.116.11.MISMATCH!feed.xsnews.nl!border-1.ams.xsnews.nl!xlned.com!feeder3.xlned.com!feeder3.cambriumusenet.nl!feed.tweaknews.nl!216.196.110.146.MISMATCH!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!news.nobody.at!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!nntp.develooper.com!nntp.perl.org
Return-Path: <orasn...@gmail.com>
Mailing-List: contact beginners-h...@perl.org; run by ezmlm
Delivered-To: mailing list beginn...@perl.org
Received: (qmail 20872 invoked from network); 25 Oct 2012 11:57:48 -0000
Received: from x1.develooper.com (207.171.7.70)
  by x6.develooper.com with SMTP; 25 Oct 2012 11:57:48 -0000
Received: (qmail 32406 invoked by uid 225); 25 Oct 2012 11:57:48 -0000
Delivered-To: beginn...@perl.org
Received: (qmail 32402 invoked by alias); 25 Oct 2012 11:57:48 -0000
X-Spam-Status: No, hits=2.7 required=8.0
	tests=BAYES_05,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_CBL,RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: la.mx.develooper.com
Received: from mail-bk0-f41.google.com (HELO mail-bk0-f41.google.com) (209.85.214.41)
    by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 25 Oct 2012 04:57:42 -0700
Received: by mail-bk0-f41.google.com with SMTP id jm1so721204bkc.14
        for <beginn...@perl.org>; Thu, 25 Oct 2012 04:57:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=message-id:from:to:subject:date:mime-version:content-type
         :content-transfer-encoding:x-priority:x-msmail-priority:x-mailer
         :x-mimeole;
        bh=HFeg2o7PgCPFyuH4VcFZdprDhtEnrGZYGul87xyLBgQ=;
        b=eDMKc9owB3iM1YD6oPAIcor4fGi0CCF3T38PhRRkXamrlnxeDWJBk/qoVd40To9XgD
         xFPFQ6XdktYkxZsO2FnCI8ONqATufvJm4qvuIAXTF5LyfNK02T70nm4cKy2vnHksM3Z1
         r736NxD7/UuSq5y33/JKNkKNoZgjUx5/+9YghFBy8LAcug13AVLHw/9lTbm7BXoX7oJa
         jrV2mUPIHaocuAVlqCLcSWNJ1BBNUlWBInysPs7LQ0lkPoW5k+3zxkF98/bt0ckbSTiI
         n2865pYQeI+knJ4xbnDC/eOXJrD9PBCvwyvftbZQCPyKy6HKklPuCaZL/DYJDyKKGg+k
         ha7A==
Received: by 10.204.10.74 with SMTP id o10mr6050825bko.9.1351166257472;
        Thu, 25 Oct 2012 04:57:37 -0700 (PDT)
Received: from octavian ([93.115.153.120])
        by mx.google.com with ESMTPS id 1sm9954002bks.3.2012.10.25.04.57.35
        (version=SSLv3 cipher=OTHER);
        Thu, 25 Oct 2012 04:57:36 -0700 (PDT)
Message-ID: <B9172527B9A149E4AEB9A07E88066562@octavian>
To: <beginn...@perl.org>
Subject: Fast XML parser?
Date: Thu, 25 Oct 2012 14:33:15 +0300
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Approved: n...@nntp.perl.org
From: orasn...@gmail.com ("Octavian Rasnita")
Bytes: 4345
Lines: 51
Content-Type: text/plain;
	charset="iso-8859-2"
Content-Transfer-Encoding: quoted-printable

Hi,

Can you recommend an XML parser which is faster than XML::Twig?

I need to use an XML parser that can parse the XML files chunk by chunk =
and which works faster (much faster) than XML::Twig, because I tried =
using this module but it is very slow.

I tried something like the code below, but I have also tried a version =
that just opens the file and parses it using regular expressions, =
however the unelegant regexp version is 25 times faster than the one =
which uses XML::Twig, and it also uses less memory.

If you think there is a module for parsing XML which would work faster =
than regular expressions, or if I can substantially improve the program =
which uses XML::Twig  then please tell me about it. If regexp will still =
be faster, I will use regexp.

Thanks.

use XML::Twig;

my $xml =3D 'path/to/xml/file.xml';

my $t=3D XML::Twig->new( twig_handlers =3D> {
    Lexem =3D> sub {
        my( $t, $lexem )=3D @_;

        my $id =3D $lexem->att( 'id' );
        my $timestamp =3D $lexem->first_child( 'Timestamp')->text;
        my $lexem_text =3D $lexem->first_child( 'Form' )->text;
        my @inflected_form =3D $lexem->children( 'InflectedForm' );

        for my $inflected_form ( @inflected_form ) {
            my $inflection_id =3D $inflected_form->first_child( =
'InflectionId' )->text;
            my $inflection_text =3D $inflected_form->first_child( 'Form' =
)->text;
        }

        $t->purge;

        return 1;
    },
} );

$t->safe_parsefile( $xml );
$t->purge;


--Octavian