bean-query grep() function

42 views
Skip to first unread message

shreedha...@gmail.com

unread,
Nov 12, 2018, 4:47:15 PM11/12/18
to Beancount
Hi,

Just noticed that the implementation of the grep function in bean-query doesn't make sense to me from the description:

class Grep(query_compile.EvalFunction):
   
"Match a group against a string and return only the matched portion."
    __intypes__
= [str, str]


   
def __init__(self, operands):
       
super().__init__(operands, str)


   
def __call__(self, context):
        args
= self.eval_args(context)
        match
= re.search(args[0], args[1])
       
if match:
           
return match.group(0)

According to the description I think it should do:
        if match:
           
# Get the first matched group; group(0) matches entire string
           
return match.group(1)


or even:
        if match:
           
# Get the last matched group or entire string if there are no groups
           
return match.group(len(match.groups))


If it is implemented as intended, I suppose it would be nice to have an overloaded grep() function that takes a 3rd parameter of type int, for the group id. I can send a patch for that if you prefer that, although I think the second implementation should work for both styles:

>>> import re
>>> m = re.search('a (b) c', 'asda b c')
>>> m.group(len(m.groups()))

'b'
>>> m = re.search('a b c', 'asda b c')

>>> m.group(len(m.groups()))
'a b c'

Thanks,
Shreedhar


Martin Blais

unread,
Nov 12, 2018, 8:47:07 PM11/12/18
to bean...@googlegroups.com
On Mon, Nov 12, 2018 at 4:47 PM <shreedha...@gmail.com> wrote:
Hi,

Just noticed that the implementation of the grep function in bean-query doesn't make sense to me from the description:

class Grep(query_compile.EvalFunction):
   
"Match a group against a string and return only the matched portion."
    __intypes__
= [str, str]


   
def __init__(self, operands):
       
super().__init__(operands, str)


   
def __call__(self, context):
        args
= self.eval_args(context)
        match
= re.search(args[0], args[1])
       
if match:
           
return match.group(0)

According to the description I think it should do:
        if match:
           
# Get the first matched group; group(0) matches entire string
           
return match.group(1)


or even:
        if match:
           
# Get the last matched group or entire string if there are no groups
           
return match.group(len(match.groups))


bergamot [hg|default]:~/p/invest/options$ python3
Python 3.7.0 (default, Jul 30 2018, 01:44:42) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.search('a+', 'cccaadd').group(0)
'aa'
>>> 

'aa' is the matched portion.
WAI



If it is implemented as intended, I suppose it would be nice to have an overloaded grep() function that takes a 3rd parameter of type int, for the group id. I can send a patch for that if you prefer that, although I think the second implementation should work for both styles:

>>> import re
>>> m = re.search('a (b) c', 'asda b c')
>>> m.group(len(m.groups()))

'b'
>>> m = re.search('a b c', 'asda b c')

>>> m.group(len(m.groups()))
'a b c'

Thanks,
Shreedhar

Patches welcome.




--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/5b9484fc-1bd0-4e8c-81b4-f6caa2877cea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shreedhar Hardikar

unread,
Nov 12, 2018, 9:41:30 PM11/12/18
to bean...@googlegroups.com
Ah, so you had no use for the subgroups - which are captured using the parentheses? 
 


If it is implemented as intended, I suppose it would be nice to have an overloaded grep() function that takes a 3rd parameter of type int, for the group id. I can send a patch for that if you prefer that, although I think the second implementation should work for both styles:

>>> import re
>>> m = re.search('a (b) c', 'asda b c')
>>> m.group(len(m.groups()))

'b'
>>> m = re.search('a b c', 'asda b c')

>>> m.group(len(m.groups()))
'a b c'

Thanks,
Shreedhar

Patches welcome.


 
Anyway, the one-line change I suggested would work for your scenario also. Basically, if there're no parent groups, it'll just return the matched portion (as it does now) and otherwise it'll return the last group in the pattern string. Not sure it makes sense to add an integer selector - since GREP returns only one string so why would have a pattern with multiple subgroups.

Anyway, I've attached a patch with some tests.

- Shreedhar

 


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/5b9484fc-1bd0-4e8c-81b4-f6caa2877cea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
grep-subgroup.patch

Shreedhar Hardikar

unread,
Nov 12, 2018, 9:59:46 PM11/12/18
to bean...@googlegroups.com
Well, I though I'd implement it anyway:

GREPN(pattern, string, N):
The pattern string can contain a parenthesized subgroup.  The function returns
only the Nth subgroup of the matched string. N=0 returns the matched portion of
the string, ignoring any parenthesized subgroups.

I guess one could put the pattern as a metadata item and then use this function to select the subgroup.
 

grepn-subgroup.patch

Martin Blais

unread,
Nov 15, 2018, 1:17:28 AM11/15/18
to bean...@googlegroups.com
Thanks Shreedhar.
I'll merge your patch in the next few weeks (I have some time off coming up).


shreedha...@gmail.com

unread,
Nov 18, 2018, 1:55:16 PM11/18/18
to Beancount


On Thursday, November 15, 2018 at 12:17:28 AM UTC-6, Martin Blais wrote:
Thanks Shreedhar.
I'll merge your patch in the next few weeks (I have some time off coming up).


Sounds good. I was also able to figure out the PR system with hg/bitbucket, so I converted this into a PR: https://bitbucket.org/blais/beancount/pull-requests/85/support-subgroup-selection-using-grepn/diff - so that it's easier to review and merge. More PRs on their way.

Also, in case this makes any difference, I'm much much more comfortable with git - which means at this time hg kinda just gets in my way. It should get easier as I figure mecurial's model.

Cheers.

-Shreedhar 
Reply all
Reply to author
Forward
0 new messages