Good regexp tutorials

6 views
Skip to first unread message

Andrei Savu

unread,
Sep 20, 2007, 9:57:21โ€ฏAM9/20/07
to php...@googlegroups.com
Hi Mario.

This email is not directly related to PHP IDS project. It seems like most of the readers
of this group have excellent knowledge of regular expressions.

I am a beginner in regular expressions and I had some problems with them. I have managed
to solve them by reading some tutorials on the internet but I want to know more.

How can I improve my understanding of regular expressions ? Can you point me to some
good tutorials or books that can help me with this ?

Many thanks in advance.

--
'Discipline is the bridge between goals and accomplishments.' -Jim Rohn
"Set your goals high, and don't stop till you get there." Bo Jackson

Mario Heiderich

unread,
Sep 20, 2007, 10:32:12โ€ฏAM9/20/07
to php...@googlegroups.com
Hey Andrei

I can recommend the following page for advanced and well written info:
http://www.regular-expressions.info/

Also you don't wanna miss http://rexv.org/

A very good book is this one:
http://www.regular-expressions.info/

Hope that helped!
Cheers,
.mario


2007/9/20, Andrei Savu <savu....@gmail.com>:



--
_______________________
php-ids.org

Mario Heiderich

unread,
Sep 20, 2007, 10:32:37โ€ฏAM9/20/07
to php...@googlegroups.com
Sorry, wrong book-link - correct one:
http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124

2007/9/20, Mario Heiderich <mario.h...@googlemail.com>:



--
_______________________
php-ids.org

Andrei Savu

unread,
Sep 20, 2007, 11:09:53โ€ฏAM9/20/07
to php...@googlegroups.com
Thanks.

I think learning regular expressions is allot like learning another programming
language, it needs allot of practice. I admire your work. You create some amazing
regular expressions.

I think this field of regular expressions will evolve allot in the future. A system that will capable of
generating regular expressions for given sets will be really amazing. I think this will be the
first form of real AI, a strong automatic classification engine. But I don't think we will see this kind
of technology very soon, or maybe we will have a surprise. It's hard to know what Google is cooking.

There is a video on http://www.ted.com that presents a new way o seeing human intelligence.
In that video the brain is seen as a machine that stores patterns and reacts to patterns with other
known patterns. This idea is at least interesting.

This talk about AI is very long, I will stop here.

Thanks again for the fast response. If anyone else could tell how they learned to use regular
expressions it will be great.

Mario Heiderich

unread,
Sep 20, 2007, 11:16:15โ€ฏAM9/20/07
to php...@googlegroups.com
As a matter off fact I had a small chat about that with Gareth who brought it up. We want to try some experiments on attack pattern comparison with the usage of levenshtein distance and the soundex methods. Let's see what we can learn and use for the PHPIDS from that stuff.

I will keep you guys informed about that...

2007/9/20, Andrei Savu <savu....@gmail.com>:



--
_______________________
php-ids.org

Mario Heiderich

unread,
Sep 20, 2007, 4:06:41โ€ฏPM9/20/07
to PHPIDS ยป Web Application Security 2.0
I just had a chat with pdp and we talked about regex in general - he
recommended that lecture (just reading myself)

http://docs.python.org/lib/re-syntax.html

On Sep 20, 5:16 pm, "Mario Heiderich" <mario.heider...@googlemail.com>
wrote:


> As a matter off fact I had a small chat about that with Gareth who brought
> it up. We want to try some experiments on attack pattern comparison with the
> usage of levenshtein distance and the soundex methods. Let's see what we can
> learn and use for the PHPIDS from that stuff.
>
> I will keep you guys informed about that...
>

> 2007/9/20, Andrei Savu <savu.and...@gmail.com>:


>
>
>
>
>
> > Thanks.
>
> > I think learning regular expressions is allot like learning another
> > programming
> > language, it needs allot of practice. I admire your work. You create some
> > amazing
> > regular expressions.
>
> > I think this field of regular expressions will evolve allot in the future.
> > A system that will capable of
> > generating regular expressions for given sets will be really amazing. I
> > think this will be the
> > first form of real AI, a strong automatic classification engine. But I
> > don't think we will see this kind
> > of technology very soon, or maybe we will have a surprise. It's hard to
> > know what Google is cooking.
>

> > There is a video onhttp://www.ted.comthat presents a new way o seeing


> > human intelligence.
> > In that video the brain is seen as a machine that stores patterns and
> > reacts to patterns with other
> > known patterns. This idea is at least interesting.
>
> > This talk about AI is very long, I will stop here.
>
> > Thanks again for the fast response. If anyone else could tell how they
> > learned to use regular
> > expressions it will be great.
>

> > On 9/20/07, Mario Heiderich <mario.heider...@googlemail.com> wrote:
>
> > > Sorry, wrong book-link - correct one:

> > >http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp...
>
> > > 2007/9/20, Mario Heiderich <mario.heider...@googlemail.com>:


>
> > > > Hey Andrei
>
> > > > I can recommend the following page for advanced and well written info:
> > > >http://www.regular-expressions.info/
>

> > > > Also you don't wanna misshttp://rexv.org/


>
> > > > A very good book is this one:
> > > >http://www.regular-expressions.info/
>
> > > > Hope that helped!
> > > > Cheers,
> > > > .mario
>

> > > > 2007/9/20, Andrei Savu <savu.and...@gmail.com>:

Andrei Savu

unread,
Sep 20, 2007, 4:54:32โ€ฏPM9/20/07
to php...@googlegroups.com
Thanks again.

I have managed to get my hands on O'Reilly book : Mastering regular expressions.
I will read this and after I will try to find other sources.

Timo Derstappen

unread,
Sep 20, 2007, 5:09:37โ€ฏPM9/20/07
to php...@googlegroups.com
On 9/20/07, Mario Heiderich <mario.h...@googlemail.com> wrote:
> As a matter off fact I had a small chat about that with Gareth who brought
> it up. We want to try some experiments on attack pattern comparison with the
> usage of levenshtein distance and the soundex methods. Let's see what we can
> learn and use for the PHPIDS from that stuff.

It's definitely an interesting topic to dive in, but I don't think
that you will achieve anything good (enough) as you did with the regex
approach. Currently it is definitely Sisyphus work to maintain the
regexes, but on the other hand data mining/natural language processing
is a very wide field to enter.

You can build a classification system with Bayes, where you need a lot
of good testing data. But you will probably end up with the same arms
race spam filters fight.

http://en.wikipedia.org/wiki/Bayesian_spam_filtering

I just stumbled upon this paper describing the complexity of finding
personal names in texts:
http://www.bsys.monash.edu.au/people/cphua/papers/The%20Personal%20Name%20Problem%20v0.8.pdf

There was an intersting talk about applied machine learning on 22c3.
They showed how to separate source code in different languages or from
different authors with SVMs. It of course all ends up with brain pong
:)
http://events.ccc.de/congress/2005/fahrplan/events/544.en.html


--
Timo Derstappen

http://teemow.com
mailto:tee...@gmail.com

Mario Heiderich

unread,
Sep 20, 2007, 5:26:25โ€ฏPM9/20/07
to php...@googlegroups.com
Hi teemow!


"Currently it is definitely Sisyphus work to maintain the regexes"

Currently yes - but I love it and actually the work amount shrinks. That's more that I hoped for some weeks ago.


"It's definitely an interesting topic to dive in, but I don't think that you will achieve anything good (enough)"

That's what we plan to find out - there are certainly other interesting ways of detecting attack patterns the PHPIDS doesn't detect yet - yet they also have to be researched (what we plan to do soon). Many people told us to not work with 'blacklisting' but thanks to the user submitted contributions we are pretty close to build up a pretty bullet-proof solution. I am not a big friend of stomped out paths in this area so I think an uncommon approach can be promising after all. Let's talk about that issue on our daily dose of nicotine at the window tomorrow mate - and maybe we should outsource the off topic part I started in this thread. Discussion on that issue is very appreciated ;)

2007/9/20, Timo Derstappen <tee...@gmail.com>:



--
_______________________
php-ids.org
Reply all
Reply to author
Forward
0 new messages