Re: Cutting slices

aapost

unread,

Mar 5, 2023, 5:59:59 PM3/5/23

to

On 3/5/23 17:43, Stefan Ram wrote:
> The following behaviour of Python strikes me as being a bit
> "irregular". A user tries to chop of sections from a string,
> but does not use "split" because the separator might become
> more complicated so that a regular expression will be required
> to find it. But for now, let's use a simple "find":
>
> |>>> s = 'alpha.beta.gamma'
> |>>> s[ 0: s.find( '.', 0 )]
> |'alpha'
> |>>> s[ 6: s.find( '.', 6 )]
> |'beta'
> |>>> s[ 11: s.find( '.', 11 )]
> |'gamm'
> |>>>
>
> . The user always inserted the position of the previous find plus
> one to start the next "find", so he uses "0", "6", and "11".
> But the "a" is missing from the final "gamma"!
>
> And it seems that there is no numerical value at all that
> one can use for "n" in "string[ 0: n ]" to get the whole
> string, isn't it?
>
>

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16]
work ... as well as string[11:324242]... lol..

dn

unread,

Mar 5, 2023, 7:30:50 PM3/5/23

to

To expand on the above, answering the OP's second question: the numeric
value is len( s ).

If the repetitive process is required, try a loop like:

>>> start_index = 11 #to cure the issue-raised

>>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'

However, if the objective is to split, then use the function built for
the purpose:

>>> s.split( "." )
['alpha', 'beta', 'gamma']

(yes, the OP says this won't work - but doesn't show why)

If life must be more complicated, but the next separator can be
predicted, then its close-relative is partition().
NB can use both split() and partition() on the sub-strings produced by
an earlier split() or ... ie there may be no reason to work strictly
from left to right
- can't really help with this because the information above only shows
multiple "." characters, and not how multiple separators might be
interpreted.

A straight-line approach might be to use maketrans() and translate() to
convert all the separators to a single character, eg white-space, which
can then be split using any of the previously-mentioned methods.

If the problem is sufficiently complicated and the OP is prepared to go
whole-hog, then PSL's tokenize library or various parser libraries may
be worth consideration...

--
Regards,
=dn

Rob Cliffe

unread,

Mar 5, 2023, 7:37:51 PM3/5/23

to

On 05/03/2023 22:59, aapost wrote:
> On 3/5/23 17:43, Stefan Ram wrote:
>>    The following behaviour of Python strikes me as being a bit
>>    "irregular". A user tries to chop of sections from a string,
>>    but does not use "split" because the separator might become
>>    more complicated so that a regular expression will be required
>>    to find it. But for now, let's use a simple "find":
>>    |>>> s = 'alpha.beta.gamma'
>> |>>> s[ 0: s.find( '.', 0 )]
>> |'alpha'
>> |>>> s[ 6: s.find( '.', 6 )]
>> |'beta'
>> |>>> s[ 11: s.find( '.', 11 )]
>> |'gamm'
>> |>>>
>>
>>    . The user always inserted the position of the previous find plus
>>    one to start the next "find", so he uses "0", "6", and "11".
>>    But the "a" is missing from the final "gamma"!
>>       And it seems that there is no numerical value at all that
>>    one can use for "n" in "string[ 0: n ]" to get the whole
>>    string, isn't it?
>>
>>
>

The final `find` returns -1 because there is no separator after 'gamma'.
So you are asking for
s[ 11 : -1]
which correctly returns 'gamm'.
You need to test for this condition.
Alternatively you could ensure that there is a final separator:
s = 'alpha.beta.gamma.'
but you would still need to test when the string was exhausted.
Best wishes
Rob Cliffe

MRAB

unread,

Mar 5, 2023, 8:57:03 PM3/5/23

to

On 2023-03-06 00:28, dn via Python-list wrote:
> On 06/03/2023 11.59, aapost wrote:

> To expand on the above, answering the OP's second question: the numeric
> value is len( s ).
>
> If the repetitive process is required, try a loop like:
>
> >>> start_index = 11 #to cure the issue-raised
>
> >>> try:
> ... s[ start_index:s.index( '.', start_index ) ]
> ... except ValueError:
> ... s[ start_index:len( s ) ]
> ...
> 'gamma'
>

Somewhat off-topic, but...

When there was a discussion about a None-coalescing operator, I thought
that it would've been nice if .find and .rfind returned None instead of -1.

There have been times when I've wanted to find the next space (or
whatever) and have it return the length of the string if absent. That
could've been accomplished with:

s.find(' ', pos) ?? len(s)

Other times I've wanted it to return -1. That could've been accomplished
with:

s.find(' ', pos) ?? -1

(There's a place in the re module where .rfind returning -1 is just the
right value.)

In this instance, slicing with None as the end is just what's wanted.

Ah, well...

Greg Ewing

unread,

Mar 5, 2023, 9:18:48 PM3/5/23

to

On 6/03/23 11:43 am, Stefan Ram wrote:
> A user tries to chop of sections from a string,
> but does not use "split" because the separator might become
> more complicated so that a regular expression will be required
> to find it.

What's wrong with re.split() in that case?

--
Greg

avi.e...@gmail.com

unread,

Mar 5, 2023, 11:02:18 PM3/5/23

to

I am not commenting on the technique or why it is chosen just the part where
the last search looks for a non-existent period:

s = 'alpha.beta.gamma'
...

s[ 11: s.find( '.', 11 )]

What should "find" do if it hits the end of a string without finding the
period you claim is a divider?

Could that be why gamma got truncated?

Unless you can arrange for a terminal period, maybe you can reconsider the
approach.

--
https://mail.python.org/mailman/listinfo/python-list

Christian Gollwitzer

unread,

Mar 6, 2023, 3:07:53 AM3/6/23

to

Am 05.03.23 um 23:43 schrieb Stefan Ram:

> The following behaviour of Python strikes me as being a bit
> "irregular". A user tries to chop of sections from a string,
> but does not use "split" because the separator might become
> more complicated so that a regular expression will be required
> to find it.

OK, so if you want to use an RE for splitting, can you not use
re.split() ? It basically works like the built-in splitting in AWK

>>> s='alphaAbetaBgamma'
>>> import re
>>> re.split(r'A|B|C', s)

['alpha', 'beta', 'gamma']
>>>

Christian

moi

unread,

Mar 6, 2023, 9:45:08 AM3/6/23

to

>>> s = 'alpha.beta.gamma'; trenne(s)

['alpha', 'beta', 'gamma']

>>> s = 'alpha---beta gamma'; trenne(s)

['alpha', 'beta', 'gamma']

>>> s = 'alpha---beta gamma999'; trenne(s)

['alpha', 'beta', 'gamma']

>>> s = '1 tau===beta+omega '; trenne(s)
['tau', 'beta', 'omega']
>>> s = 'AalphaBbetaGgamma'; trenne(s)

['alpha', 'beta', 'gamma']

>>> s = 'a.😁bc\u1234xy z'; trenne(s)
['a', 'bc', 'xy', 'z']
>>>