Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
ซ Groups Home
python re - a not needed
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
kepes.krisztian  
View profile  
 More options Dec 16 2004, 4:06 am
Newsgroups: comp.lang.python
From: "kepes.krisztian" <kepes.kriszt...@peto.hu>
Date: Thu, 16 Dec 2004 10:06:42 +0100
Local: Thurs, Dec 16 2004 4:06 am
Subject: python re - a not needed
Hi !

I want to get infos from a html, but I need all chars except <.
All chars is: over chr(31), and over (128) - hungarian accents.
The .* is very hungry, it is eat < chars too.

If I can use not, I simply define an regexp.
[not<]*</a>

It is get all in the href.

I wrote this programme, but it is too complex - I think:

import re

l=[]
for i in range(33,65):
    if i<>ord('<') and i<>ord('>'):
       l.append('\\'+chr(i))
s='|'.join(l)
all='\w|\s|\%s-\%s|%s'%(chr(128),chr(255),s)
sre='<Subj>([%s]{1,1024})</d>'%all
#sre='<Subj>([?!\\<]{1,1024})</d>'
s='<Subj>xmvccv มมม sdfkdsfj eirfie</d><A></d>'

print sre
print s
cp=re.compile(sre)
m=cp.search(s)
print m.groups()

Have the python an regexp exception, or not function ? How to I use it ?

Thanx for help:
 kk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Otten  
View profile  
 More options Dec 16 2004, 4:21 am
Newsgroups: comp.lang.python
From: Peter Otten <__pete...@web.de>
Date: Thu, 16 Dec 2004 10:21:22 +0100
Local: Thurs, Dec 16 2004 4:21 am
Subject: Re: python re - a not needed

You could try these regexps or variants thereof:

"<Subj>([^<]*)"

'^' changes the character set to exclude any characters listed after '^'
from matching.

"<Subj>(.*?)<"

The '?' makes the preceding '*' non-greedy, i. e. the following '<' will
match the first '<' character encountered in the string to be searched.

Peter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Max M  
View profile  
 More options Dec 16 2004, 4:52 am
Newsgroups: comp.lang.python
From: Max M <m...@mxm.dk>
Date: Thu, 16 Dec 2004 10:52:26 +0100
Local: Thurs, Dec 16 2004 4:52 am
Subject: Re: python re - a not needed

kepes.krisztian wrote:
> I want to get infos from a html, but I need all chars except <.
> All chars is: over chr(31), and over (128) - hungarian accents.
> The .* is very hungry, it is eat < chars too.

Instead of writing ad-hoc html parsers, use BeautifulSoup instead.

http://www.crummy.com/software/BeautifulSoup/

I will most likely do what you want in 2 or 3 lines of code.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Dec 16 2004, 5:00 am
Newsgroups: comp.lang.python
From: Paul Rubin <http://phr...@NOSPAM.invalid>
Date: 16 Dec 2004 02:00:17 -0800
Local: Thurs, Dec 16 2004 5:00 am
Subject: Re: python re - a not needed

Max M <m...@mxm.dk> writes:
> Instead of writing ad-hoc html parsers, use BeautifulSoup instead.

> http://www.crummy.com/software/BeautifulSoup/

Hey, I like that.  Thanks.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »