Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular expression problem

53 views
Skip to first unread message

mukesh tiwari

unread,
Mar 10, 2013, 1:42:29 PM3/10/13
to
Hello all
I am trying to solve this problem[1] using regular expression. I wrote this code but I am getting time limit exceed. Could some one please tell me how to make this code run faster.

import re

if __name__ == "__main__":
n = int ( raw_input() )
c = 1
while c <= n :
email = filter ( lambda x : x != None , [ re.search ( '[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*' , x ) for x in raw_input().split(' ') ] )
t = len ( email )
print 'Case #' + str ( c ) + ': ' + str ( t )
for i in xrange ( t ):
print email[i].group()
c += 1

Also rather than breaking the string at space, I tried to use findall but I am not getting correct result.

>>> re.findall ( '[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*[ ]+','mukeshtiwa...@gmail.com mukeshtiwar...@gmail.com')
['com']

I am suppose to get [ mukeshtiwa...@gmail.com , mukeshtiwa...@gmail.com] ?

Regards
Mukesh Tiwari

[1] http://www.spoj.com/problems/MAIN12C/

Chris Angelico

unread,
Mar 10, 2013, 1:59:59 PM3/10/13
to pytho...@python.org
On Mon, Mar 11, 2013 at 4:42 AM, mukesh tiwari
<mukeshtiw...@gmail.com> wrote:
> I am trying to solve this problem[1] using regular expression. I wrote this code but I am getting time limit exceed. Could some one please tell me how to make this code run faster.

What is the time limit? I just tried it (Python 2.6 under Windows) and
it finished in a humanly-immeasurable amount of time. Are you sure
that STDIN (eg raw_input()) is where your test data is coming from?

ChrisA

mukesh tiwari

unread,
Mar 10, 2013, 2:05:56 PM3/10/13
to pytho...@python.org
Hi Chris
On the problem page, it is 3 second.

> What is the time limit? I just tried it (Python 2.6 under Windows) and
>
> it finished in a humanly-immeasurable amount of time. Are you sure
>
> that STDIN (eg raw_input()) is where your test data is coming from?

Yes, on SPOJ we read data from STDIN.

Regards
Mukesh Tiwari

>
>
>
> ChrisA

Chris Angelico

unread,
Mar 10, 2013, 2:08:44 PM3/10/13
to pytho...@python.org
On Mon, Mar 11, 2013 at 4:59 AM, Chris Angelico <ros...@gmail.com> wrote:
> On Mon, Mar 11, 2013 at 4:42 AM, mukesh tiwari
> <mukeshtiw...@gmail.com> wrote:
>> I am trying to solve this problem[1] using regular expression. I wrote this code but I am getting time limit exceed. Could some one please tell me how to make this code run faster.
>
> What is the time limit? I just tried it (Python 2.6 under Windows) and
> it finished in a humanly-immeasurable amount of time. Are you sure
> that STDIN (eg raw_input()) is where your test data is coming from?

Oops, reading comprehension fail. Time limit is 3s on a Pentium III.
I've no idea how long your code will take on that hardware, but I
doubt that it's taking three seconds. So my query regarding source of
test data still stands. Can you put together an uber-simple test
program that just echoes the lines of input, to make sure it really is
coming off stdin?

The problem description certainly does seem to imply stdin, but I
can't see why your code would take three seconds unless it's stalling
for some reason. Though perhaps on a P3 with the maximum 100 tests,
maybe that could take a while...

Something to try: Since you're using re.search(), see if you can drop
the complemented sets at the beginning [^~!@#$%^&*()<>?,.]* and end
[^~!@#$%^&*()<>?,.a-zA-Z0-9]* - they're going to be slow to process.
Also, you can simplify this:

[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*

to this:

[a-zA-Z0-9][a-zA-Z0-9._]{4,}

The brace notation means "at least 4, at most infinity".

Try those out and see if you still get the results you want.

ChrisA

mukesh tiwari

unread,
Mar 10, 2013, 2:05:56 PM3/10/13
to comp.lan...@googlegroups.com, pytho...@python.org
Hi Chris
On the problem page, it is 3 second.

> What is the time limit? I just tried it (Python 2.6 under Windows) and
>
> it finished in a humanly-immeasurable amount of time. Are you sure
>
> that STDIN (eg raw_input()) is where your test data is coming from?

mukesh tiwari

unread,
Mar 10, 2013, 2:48:16 PM3/10/13
to pytho...@python.org
Hi Chris
Thank you! Now I am getting wrong answer so at least program is faster then previous one and I am looking for wrong answer reason. Thanks again!

import re

if __name__ == "__main__":
n = int ( raw_input() )
c = 1
while c <= n :
email = filter ( lambda x : x != None , [ re.search ( '[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+.(com|edu|org|co.in)' , x ) for x in raw_input().split(' ') ] )
t = len ( email )
print 'Case #' + str ( c ) + ': ' + str ( t )
for i in xrange ( t ) :
print email[i].group()
c += 1

Regards
Mukesh Tiwari


mukesh tiwari

unread,
Mar 10, 2013, 2:48:16 PM3/10/13
to comp.lan...@googlegroups.com, pytho...@python.org
Hi Chris
Thank you! Now I am getting wrong answer so at least program is faster then previous one and I am looking for wrong answer reason. Thanks again!

import re

if __name__ == "__main__":
n = int ( raw_input() )
c = 1
while c <= n :
email = filter ( lambda x : x != None , [ re.search ( '[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+.(com|edu|org|co.in)' , x ) for x in raw_input().split(' ') ] )
t = len ( email )
print 'Case #' + str ( c ) + ': ' + str ( t )
for i in xrange ( t ) :
print email[i].group()
c += 1

Regards
Mukesh Tiwari


Chris Angelico

unread,
Mar 10, 2013, 2:57:14 PM3/10/13
to pytho...@python.org
On Mon, Mar 11, 2013 at 5:48 AM, mukesh tiwari
<mukeshtiw...@gmail.com> wrote:
> Hi Chris
> Thank you! Now I am getting wrong answer so at least program is faster then previous one and I am looking for wrong answer reason. Thanks again!

Excellent! Have fun.

Incidentally, regular expressions aren't the only way to solve this
sort of problem. If you get stuck with one method, it may be worth
trying another one, to see if you can get around the issue.

As they say, now you have two problems...

ChrisA

Terry Reedy

unread,
Mar 10, 2013, 10:06:12 PM3/10/13
to pytho...@python.org
On 3/10/2013 1:42 PM, mukesh tiwari wrote:
> Hello all
> I am trying to solve this problem[1]
> [1] http://www.spoj.com/problems/MAIN12C/

As I remember, and as it still appears, this site severely penalizes
Python solvers by using the same time limit for all languages. Thus, a
'slow' python program may work correctly but the site will not let you
know. A test that refuses to answer is no test at all. In the meanwhile,
an algorithmically equivalent C program will be run and judged correct,
so the programmer can try to speed up while not losing correctness.

By teaching 'speed before correctness", this site promotes bad
programming habits and thinking (and the use of low-level but faster
languages). I quote your later response: "Now I am getting wrong answer
so at least program is faster then previous one". If the previous one
was correct and the revision wrong, you should toss the revision and go
back to the correct program.

I recommend that you work on problems where you have tests that you can
actually run even before you code.

--
Terry Jan Reedy

jmfauth

unread,
Mar 11, 2013, 5:28:14 AM3/11/13
to
On 11 mar, 03:06, Terry Reedy <tjre...@udel.edu> wrote:

>
> ...
> By teaching 'speed before correctness", this site promotes bad
> programming habits and thinking (and the use of low-level but faster
> languages).
> ...


This is exactly what "your" flexible string representation
does!

And away from technical aspects, you even succeeded to
somehow lose unicode compliance.

jmf

Mark Lawrence

unread,
Mar 11, 2013, 6:19:51 AM3/11/13
to pytho...@python.org
Please stick to something you know about such as sexual self abuse.

--
Cheers.

Mark Lawrence

rusi

unread,
Mar 11, 2013, 9:18:56 AM3/11/13
to
On Mar 11, 2:28 pm, jmfauth <wxjmfa...@gmail.com> wrote:
> On 11 mar, 03:06, Terry Reedy <tjre...@udel.edu> wrote:
>
>
>
> > ...
> > By teaching 'speed before correctness", this site promotes bad
> > programming habits and thinking (and the use of low-level but faster
> > languages).
> > ...
>
> This is exactly what "your" flexible string representation
> does!

This is an old complaint of your with no new data for supporting it

>
> And away from technical aspects, you even succeeded to
> somehow lose unicode compliance.

This is a new complaint.
Just to make it clear:
1. All your recent complaints about unicode were in the realm of
performance
So your complaint that python has lost unicode compliance can mean one
of:
2a. The unicode standard mandates performance criteria
or
2b. There are problems with python's implementation (of strings?) that
have functional problems apart from your old performance complaints
or
2c. You need to look up what 'compliance' means

My own choice is to have a mid-point between
Very early binding: Narrow vs wide builds
Very late binding: String reps can change with the characters as they
are seen
This mid point would be perhaps a commandline switch to choose string-
engine attributes.


However to make this choice even worth a second look we need to have
hard data about performance that you have been unable to provide.

[See the recent thread of RoR vs Django to see the problems of
excessive spurious choice]

Ned Deily

unread,
Mar 11, 2013, 2:13:16 PM3/11/13
to pytho...@python.org
A friendly reminder that this forum is for general discussion and
questions about Python.

"Pretty much anything Python-related is fair game for discussion, and
the group is even fairly tolerant of off-topic digressions; there have
been entertaining discussions of topics such as floating point, good
software design, and other programming languages such as Lisp and Forth."

But ...

"Rudeness and personal attacks, even in reaction to blatant flamebait,
are strongly frowned upon. People may strongly disagree on an issue, but
usually discussion remains civil. In case of an actual flamebait
posting, you can ignore it, quietly plonk the offending poster in your
killfile or mail filters, or write a sharp but still-polite response,
but at all costs resist the urge to flame back."

http://www.python.org/community/lists/

It's up to all of us to help keep this group/list a place where people
enjoy participating, without fear of gratuitous personal sniping.
Thanks!

--
Ned Deily,
n...@acm.org

Serhiy Storchaka

unread,
Mar 11, 2013, 2:30:36 PM3/11/13
to pytho...@python.org
On 11.03.13 04:06, Terry Reedy wrote:
> On 3/10/2013 1:42 PM, mukesh tiwari wrote:
>> Hello all
>> I am trying to solve this problem[1]
>> [1] http://www.spoj.com/problems/MAIN12C/
>
> As I remember, and as it still appears, this site severely penalizes
> Python solvers by using the same time limit for all languages. Thus, a
> 'slow' python program may work correctly but the site will not let you
> know.

I'm sure the time limits are enough to solve most (if not all) of
problems. Actually all submitted solutions on Python for this problem
run from 0.47 to 0.61 seconds (http://www.spoj.com/ranks/MAIN12C/).

Terry Reedy

unread,
Mar 11, 2013, 4:23:21 PM3/11/13
to pytho...@python.org
On 3/11/2013 2:30 PM, Serhiy Storchaka wrote:
> On 11.03.13 04:06, Terry Reedy wrote:
>> On 3/10/2013 1:42 PM, mukesh tiwari wrote:
>>> Hello all
>>> I am trying to solve this problem[1]
>>> [1] http://www.spoj.com/problems/MAIN12C/
>>
>> As I remember, and as it still appears, this site severely penalizes
>> Python solvers by using the same time limit for all languages. Thus, a
>> 'slow' python program may work correctly but the site will not let you
>> know.
>
> I'm sure the time limits are enough to solve most (if not all) of
> problems. Actually all submitted solutions on Python for this problem
> run from 0.47 to 0.61 seconds (http://www.spoj.com/ranks/MAIN12C/).

You do not see the solutions that timed out. I suppose you are pointing
to the fact that for this problem there are solutions close to but under
the time limit. However, algorithm running times are not evenly
distributed. Suppose, for instance) there is a correct O(n**2) solution
and a correct O(n) solution and that the ones listed are the O(n)
solutions. Then the Python O(n**2) solutions could easily take 10x
longer to run and time out, while equivalent C solutions do not.

Mukesh is not the first to post here a reasonable looking solution for
that site that he could not judge because the test quite and refused to
answer. I point out again that he was 'happy' to have a faster but
incorrect program, even though it might have been a regression from his
original.

--
Terry Jan Reedy

0 new messages