Forum: >>> Magnum BBS <<<

Re: Cutting slices

From aapost@21:1/5 to Stefan Ram on Sun Mar 5 17:59:39 2023

On 3/5/23 17:43, Stefan Ram wrote:

The following behaviour of Python strikes me as being a bit
"irregular". A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it. But for now, let's use a simple "find":

s = 'alpha.beta.gamma'
s[ 0: s.find( '.', 0 )]

|'alpha'

s[ 6: s.find( '.', 6 )]

|'beta'

s[ 11: s.find( '.', 11 )]

|'gamm'

. The user always inserted the position of the previous find plus
one to start the next "find", so he uses "0", "6", and "11".
But the "a" is missing from the final "gamma"!

And it seems that there is no numerical value at all that
one can use for "n" in "string[ 0: n ]" to get the whole
string, isn't it?

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16]
work ... as well as string[11:324242]... lol..

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to All on Sun Mar 5 22:43:51 2023

The following behaviour of Python strikes me as being a bit
"irregular". A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it. But for now, let's use a simple "find":

s = 'alpha.beta.gamma'
s[ 0: s.find( '.', 0 )]

|'alpha'

s[ 6: s.find( '.', 6 )]

|'beta'

s[ 11: s.find( '.', 11 )]

|'gamm'

. The user always inserted the position of the previous find plus
one to start the next "find", so he uses "0", "6", and "11".
But the "a" is missing from the final "gamma"!

And it seems that there is no numerical value at all that
one can use for "n" in "string[ 0: n ]" to get the whole
string, isn't it?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rob Cliffe@21:1/5 to aapost on Mon Mar 6 00:36:57 2023

On 05/03/2023 22:59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
   |>>> s = 'alpha.beta.gamma'

s[ 0: s.find( '.', 0 )]

|'alpha'

s[ 6: s.find( '.', 6 )]

|'beta'

s[ 11: s.find( '.', 11 )]

|'gamm'

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
      And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?

The final `find` returns -1 because there is no separator after 'gamma'.
So you are asking for
    s[ 11 : -1]
which correctly returns 'gamm'.
You need to test for this condition.
Alternatively you could ensure that there is a final separator:
    s = 'alpha.beta.gamma.'
but you would still need to test when the string was exhausted.
Best wishes
Rob Cliffe

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dn@21:1/5 to aapost on Mon Mar 6 13:28:01 2023

On 06/03/2023 11.59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":

s = 'alpha.beta.gamma'
s[ 0: s.find( '.', 0 )]

|'alpha'

s[ 6: s.find( '.', 6 )]

|'beta'

s[ 11: s.find( '.', 11 )]

|'gamm'

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16]
work ... as well as string[11:324242]... lol..

To expand on the above, answering the OP's second question: the numeric
value is len( s ).

If the repetitive process is required, try a loop like:

start_index = 11 #to cure the issue-raised

try:

... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'

However, if the objective is to split, then use the function built for
the purpose:

s.split( "." )

['alpha', 'beta', 'gamma']

(yes, the OP says this won't work - but doesn't show why)

If life must be more complicated, but the next separator can be
predicted, then its close-relative is partition().
NB can use both split() and partition() on the sub-strings produced by
an earlier split() or ... ie there may be no reason to work strictly
from left to right
- can't really help with this because the information above only shows
multiple "." characters, and not how multiple separators might be
interpreted.

A straight-line approach might be to use maketrans() and translate() to
convert all the separators to a single character, eg white-space, which
can then be split using any of the previously-mentioned methods.

If the problem is sufficiently complicated and the OP is prepared to go whole-hog, then PSL's tokenize library or various parser libraries may
be worth consideration...

--
Regards,
=dn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MRAB@21:1/5 to dn via Python-list on Mon Mar 6 01:56:35 2023

On 2023-03-06 00:28, dn via Python-list wrote:

On 06/03/2023 11.59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":

s = 'alpha.beta.gamma'
s[ 0: s.find( '.', 0 )]

|'alpha'

s[ 6: s.find( '.', 6 )]

|'beta'

s[ 11: s.find( '.', 11 )]

|'gamm'

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16]
work ... as well as string[11:324242]... lol..

To expand on the above, answering the OP's second question: the numeric
value is len( s ).

If the repetitive process is required, try a loop like:

>>> start_index = 11 #to cure the issue-raised

>>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'

Somewhat off-topic, but...

When there was a discussion about a None-coalescing operator, I thought
that it would've been nice if .find and .rfind returned None instead of -1.

There have been times when I've wanted to find the next space (or
whatever) and have it return the length of the string if absent. That
could've been accomplished with:

s.find(' ', pos) ?? len(s)

Other times I've wanted it to return -1. That could've been accomplished
with:

s.find(' ', pos) ?? -1

(There's a place in the re module where .rfind returning -1 is just the
right value.)

In this instance, slicing with None as the end is just what's wanted.

Ah, well...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to Stefan Ram on Mon Mar 6 15:18:29 2023

On 6/03/23 11:43 am, Stefan Ram wrote:

A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it.

What's wrong with re.split() in that case?

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to Stefan Ram on Sun Mar 5 23:01:52 2023

I am not commenting on the technique or why it is chosen just the part where the last search looks for a non-existent period:

s = 'alpha.beta.gamma'
...
s[ 11: s.find( '.', 11 )]

What should "find" do if it hits the end of a string without finding the
period you claim is a divider?

Could that be why gamma got truncated?

Unless you can arrange for a terminal period, maybe you can reconsider the approach.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=[email protected]> On Behalf Of aapost
Sent: Sunday, March 5, 2023 6:00 PM
To: [email protected]
Subject: Re: Cutting slices

On 3/5/23 17:43, Stefan Ram wrote:

The following behaviour of Python strikes me as being a bit
"irregular". A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it. But for now, let's use a simple "find":

s = 'alpha.beta.gamma'
s[ 0: s.find( '.', 0 )]

|'alpha'

s[ 6: s.find( '.', 6 )]

|'beta'

s[ 11: s.find( '.', 11 )]

|'gamm'

. The user always inserted the position of the previous find plus
one to start the next "find", so he uses "0", "6", and "11".
But the "a" is missing from the final "gamma"!

And it seems that there is no numerical value at all that
one can use for "n" in "string[ 0: n ]" to get the whole
string, isn't it?

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16]
work ... as well as string[11:324242]... lol..
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christian Gollwitzer@21:1/5 to All on Mon Mar 6 09:07:32 2023

Am 05.03.23 um 23:43 schrieb Stefan Ram:

The following behaviour of Python strikes me as being a bit
"irregular". A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it.

OK, so if you want to use an RE for splitting, can you not use
re.split() ? It basically works like the built-in splitting in AWK

s='alphaAbetaBgamma'
import re
re.split(r'A|B|C', s)

['alpha', 'beta', 'gamma']

Christian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Greg Ewing on Mon Mar 6 10:50:47 2023

Greg Ewing <[email protected]> writes:

On 6/03/23 11:43 am, Stefan Ram wrote:

A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it.

What's wrong with re.split() in that case?

Thank's for all answers!

What's wrong with re.split()?

First of all, I was not aware at the moment that regular
expressions are indeed allowed in "re.split". But I am
kind of "parsing" a language where there is a separator
with several options, and my gut tells me to use an
iterative approach, where I find the next candidate for
a separator, then do a more detailed analysis, and finally
remove the part up to that separator from the string and
look for the next separator candidate. - So, I prefer not
to use a single split call for this as this would be less
clear to me.

Yes, indeed, one can "len" as a value in a slice to copy
till the end of a string.

'abc'[ 0: len( 'abc' )]

|'abc'

I was not aware, that "None" is also possible.

There is no single fixed integer value, though, and Python
does not even have an "int.max" value one could use. But in
practice, a very large int value should do.

'abc'[ 0: int(9E99) ]

|'abc'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Mon Mar 6 06:44:51 2023

Le lundi 6 mars 2023 à 09:07:53 UTC+1, Christian Gollwitzer a écrit :

Am 05.03.23 um 23:43 schrieb Stefan Ram:

The following behaviour of Python strikes me as being a bit
"irregular". A user tries to chop of sections from a string,
but does not use "split" because the separator might become
more complicated so that a regular expression will be required
to find it.

OK, so if you want to use an RE for splitting, can you not use
re.split() ? It basically works like the built-in splitting in AWK

s='alphaAbetaBgamma'
import re
re.split(r'A|B|C', s)

['alpha', 'beta', 'gamma']

Christian

s = 'alpha.beta.gamma'; trenne(s)

['alpha', 'beta', 'gamma']

s = 'alpha---beta gamma'; trenne(s)

['alpha', 'beta', 'gamma']

s = 'alpha---beta gamma999'; trenne(s)

['alpha', 'beta', 'gamma']

s = '1 tau===beta+omega '; trenne(s)

['tau', 'beta', 'omega']

s = 'AalphaBbetaGgamma'; trenne(s)

['alpha', 'beta', 'gamma']

s = 'a.😁bc\u1234xy z'; trenne(s)

['a', 'bc', 'xy', 'z']

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	52:12:13
Calls:	12,445
Calls today:	5
Files:	15,192
Messages:	6,537,264

Re: Cutting slices

Who's Online

Recent Visitors

System Info