一个正则问题

0 views
Skip to first unread message

kid

unread,
Dec 30, 2009, 12:59:07 AM12/30/09
to pyth...@googlegroups.com
html代码举例:
<html>
<a href="http://www.000011.html">ab000011ab</a>hello123456world
</html>

问题:匹配代码中所有的6位数,但这个6位数不能是html标签里的,比如上面那两个000011,我只想要123456,然后,把123456替换成<a href="www.123456.html">123456</a>

最终的html代码如下:
<html>
<a href="http://www.000011.html">ab000011</a>hello<a href="http://www.123456.html">123456</a>world
</html>

有没有比较酷(聪明)的正则方法,或者beautifulsoup里的也行
               
谢谢大家~

Heroboy

unread,
Dec 30, 2009, 1:31:04 AM12/30/09
to pyth...@googlegroups.com
随便乱写一个:.replace(/<[^>]+>/,'').findall(/[0-9]+/)

2009/12/30 kid <kid...@gmail.com>

--
来自: `python-cn`:CPyUG ~ 华蟒用户组 | 发言:pyth...@googlegroups.com
退订: http://tinyurl.com/45a9tb //针对163/qq邮箱:http://tinyurl.com/4dg6hc
详情: https://groups.google.com/group/python-cn
严正: 理解列表! 智慧提问! http://wiki.woodpecker.org.cn/moin/AskForHelp

Qign

unread,
Dec 30, 2009, 3:13:50 AM12/30/09
to pyth...@googlegroups.com
aa='''<html>

<a href="http://www.000011.html">ab000011ab</a>hello123456world
</html>'''

import re
cc = re.sub(r'''</[^>]+?>([^<]*?)(\d{6})([^<]*)''',r'''<a href="http://www.\2.html">\2</a>''',aa)
print cc

<html>
<a href="http://www.000011.html">ab000011ab<a href="http://www.123456.html">123456</a></html>



2009/12/30 Heroboy <yangw...@gmail.com>



--
+++++++++++++++++
         Qign~~
++++++++++++++++

kid

unread,
Dec 30, 2009, 3:37:55 AM12/30/09
to pyth...@googlegroups.com
\2 这样的用法,Good

btw,我之前测试re.sub的效果,结果发现比较差劲~有没有高效一点的? :D

kid

unread,
Dec 30, 2009, 3:40:44 AM12/30/09
to pyth...@googlegroups.com
我是说re.sub的效率比较低一些,那样的话,会影响用户体验~

2009/12/30 kid <kid...@gmail.com>
Reply all
Reply to author
Forward
0 new messages