Web Images Videos Maps News Shopping Gmail more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Help with inverse of regular expression
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
jax  
View profile  
 More options Nov 5, 10:49 am
From: jax <jackma...@gmail.com>
Date: Thu, 5 Nov 2009 07:49:47 -0800 (PST)
Local: Thurs, Nov 5 2009 10:49 am
Subject: Help with inverse of regular expression
I have a working regex that selects all <def>...</def> tags.

<def\b[^>]*>(.*?)</def>

The problem is that I want to select everything other than the def
tags and their contents.  How would I go about this?


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eugeny.Sattler@gmail.com  
View profile  
 More options Nov 6, 3:52 am
From: "Eugeny.Satt...@gmail.com" <eugeny.satt...@gmail.com>
Date: Fri, 6 Nov 2009 00:52:38 -0800 (PST)
Local: Fri, Nov 6 2009 3:52 am
Subject: Re: Help with inverse of regular expression
Hm... how about deleting all def tags and their contents and storing
the remainder into the result variable?

By the way, if you have nested def tags your regex may turn to be
imperfect.

On 5 ноя, 19:49, jax <jackma...@gmail.com> wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jax  
View profile  
 More options Nov 6, 9:29 pm
From: jax <jackma...@gmail.com>
Date: Fri, 6 Nov 2009 18:29:57 -0800 (PST)
Local: Fri, Nov 6 2009 9:29 pm
Subject: Re: Help with inverse of regular expression
The problem is that I actually want the <def> tags.  I am editing a
Pearl script but don't know how perl works properly.  I have changed
it to my liking but need to strip all non <def> tags out of the
document. I want to do a find and replace on all non <def> tags.
Script below.

#!/usr/bin/perl -w

# This script performs a headword search within GCIDE_XML.
# Submitted by Alexei Puzikov (k...@validio.com.ua),
# and modified by Michael Dyck (jmd...@ibiblio.org).

use CGI qw (param escape);
use CGI::Carp qw (fatalsToBrowser);

$query = param('query');

$script_url = $ENV{"SCRIPT_URL"};
$xml_file_dir = "/home/users/web/b1617/ipg.jackmatt2/dictionary/
xml_files";

if ($query eq ""){ $query = "x"; }

# Get the first character of the query.
$first = lc(substr($query,0,1));

if ( $first =~ /[a-z]/ )
{
        # That character is a letter,
        # so we only have to examine the file(s) for that letter.
        $file_pattern = "$xml_file_dir/gcide_${first}*.xml";

}

else
{
        # That character isn't a letter
        # (i.e., it's some sort of regexp metacharacter),
        # so we have to examine all files.
        $file_pattern = "$xml_file_dir/gcide_*.xml";
}

@files = glob( $file_pattern );

$pattern = "<hw>$query</hw>";

foreach $datafile (@files)
{
        # print "$datafile\n";
        open(DATAFILE,"$datafile");
        while (1)
        {
                $astring = <DATAFILE>;
                last if !defined($astring);

                # Headwords appear with syllabification/stress marks * " `.
                # There also might be an initial "&Verbar;".
                # We need to remove those before comparing with $pattern.
                # Possible bug: what if those marks appear in $astring
                # *outside* of the <hw> element?
                $astring =~ s/["*`]//g;
                $astring =~ s/&Verbar;//g;

                if ($astring =~ /$pattern/i)
                {
                        # We have found a headword element that matches the query!

                        # Read lines to the end of this "entry".

                        $result = '';
                        while (1)
                        {
                                chomp($astring);
                                $result = "$result$astring\n";
                                $astring=<DATAFILE>;
                                last if !defined($astring);
                                last if $astring =~ /<hw>/;
                        }

                        # Convert GCIDE_XML tags into their HTML renditions

                        $_ = $result;

                        #s#<hw>#<spelling>#g;
                        #s#</hw>#</spelling>#g;

                        #s#<hwf>#<B>#g;
                        #s#</hwf>#</B>#g;

                        # s#<q>#<BLOCKQUOTE>#g;
                        # s#</q>#</BLOCKQUOTE>#g;

                        #s#<br/>#<BR>#g;
                        #s#<pbr/>#<BR>#g;

                        #$definitions = m#<def\b[^>]*>(.*?)</def>#g;

                        #s#^(.*)(<def\b[^>]*>(.*?)</def>)(.*)$#$1$3#g;

                        #s#^(.*)(<def\b[^>]*>(.*?)</def>)(.*)$#$1$3#;

                        s#<br/>##g;
                        s#<pbr/>##g;
                        s#&##g;

                         s#<(qex|qau|source|xex|pos|fld|ets|etsep|au|src|altname|altnpluf|
mark|ex|asp|cref|sd|contr|ant|spn|ord|gen|pluf|uex|stype|mathex|ratio|
singf|xlati|iref|figref|ptcl|part|var|tr)># - #g;
                        s#</(qex|qau|source|xex|pos|fld|ets|etsep|au|src|altname|altnpluf|
mark|ex|asp|cref|sd|contr|ant|spn|ord|gen|pluf|uex|stype|mathex|ratio|
singf|xlati|iref|figref|ptcl|part|var|tr)># - #g;

                        #s#<er>(.+?)</er>#<A href="$script_url?query=$1">$1</A>#g;

                        push @matches, $_;

                        redo;
                        # Redo the body of the while-loop,
                        # in case the <hw> that we just read
                        # (which signalled the end of the previous entry)
                        # is also a match for the query.

                } # End of if.
        } # End of while.
        close(DATAFILE);

}

$count=@matches;

print << "EOM";
Content-type: text/html

<?xml version="1.0" encoding="ISO-8859-1"?>
EOM

print "<definitions count=\"$count\">\n";
if ($count > 0)
{
        foreach $match (@matches)
        {
                print "$match\n";
        }

}

print "</definitions>"

On Nov 6, 3:52 pm, "Eugeny.Satt...@gmail.com"


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ross Presser  
View profile  
 More options Nov 9, 9:37 am
From: Ross Presser <rpres...@gmail.com>
Date: Mon, 9 Nov 2009 06:37:48 -0800 (PST)
Local: Mon, Nov 9 2009 9:37 am
Subject: Re: Help with inverse of regular expression
For xml manipulation, you're better off with an xml library than
trying to reinvent one based on regex.

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google