regex on strings

35 views
Skip to first unread message

mkeb...@gmail.com

unread,
Nov 25, 2013, 11:44:51 AM11/25/13
to num...@googlegroups.com
(yes I know it's supposed to be math-oriented library)

Hello,

I'm doing server log analysis with PyTables that apparently uses Numexpr for filtering tables (in Table.Where conditions and the like).

Apart from purely numerical data it would be very useful for me to be able to filter rows on regexes. At the moment it seems like only strict string equality is supported.

Yes, I know this would severely limit performance, but it's either here that I get limited performance or elsewhere (where I will get even more limited performance due to not filtering rows on numerical conditions and even more on data loading of lines matching regexes).

The logic in interpreter.cpp/stringcmp is simple, but how can I connect string regex matching logic to the rest of it? I see there are opcodes in interp_body.cpp:

        case OP_GT_BSS: VEC_ARG2(b_dest = (stringcmp(s1, s2, ss1, ss2) > 0));
        case OP_GE_BSS: VEC_ARG2(b_dest = (stringcmp(s1, s2, ss1, ss2) >= 0));
        case OP_EQ_BSS: VEC_ARG2(b_dest = (stringcmp(s1, s2, ss1, ss2) == 0));
        case OP_NE_BSS: VEC_ARG2(b_dest = (stringcmp(s1, s2, ss1, ss2) != 0));


But I do not see where the condition is parsed and how can I add the opcode for regex matching so that it used while parsing conditions.





Francesc Alted

unread,
Nov 25, 2013, 3:31:53 PM11/25/13
to num...@googlegroups.com
Well, I think you could use a recent patch implementing the support for min and max to figure out how you can start doing this:

http://code.google.com/r/adamdryan-numexpr/source/detail?spec=svn4590e94efa54857aa9619f0be0a2f95e44db0f24&r=4590e94efa54857aa9619f0be0a2f95e44db0f24#

It is not difficult, but I don't know how regex could complicate things.

HTH,

Francesc

El 25/11/13 17:44, mkeb...@gmail.com ha escrit:
--
You received this message because you are subscribed to the Google Groups "numexpr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numexpr+u...@googlegroups.com.
To post to this group, send email to num...@googlegroups.com.
Visit this group at http://groups.google.com/group/numexpr.
For more options, visit https://groups.google.com/groups/opt_out.


-- 
Francesc Alted

mkeb...@gmail.com

unread,
Nov 26, 2013, 5:59:40 AM11/26/13
to num...@googlegroups.com


Thank you!! That's exactly the missing piece I needed.
Reply all
Reply to author
Forward
0 new messages