• Re: Regular Expression bug?

    From [email protected]@21:1/5 to jose isaias cabrera on Thu Mar 2 14:29:37 2023
    On 2023-03-02 at 14:22:41 -0500,
    jose isaias cabrera <[email protected]> wrote:

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.

    The bug is in your regular expression; the plus modifier is greedy.

    If you want to match up to the first space, then you'll need something
    like [^ ] (i.e., everything that isn't a space) instead of that dot.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jose isaias cabrera@21:1/5 to All on Thu Mar 2 14:22:41 2023
    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.

    josé

    --

    What if eternity is real? Where will you spend it? Hmmmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to jose isaias cabrera on Fri Mar 3 06:28:22 2023
    On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <[email protected]> wrote:

    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.


    Not a bug. Find the longest possible match that fits this; as long as
    you can find a space immediately after it, everything in between goes
    into the .+ part.

    If you want to exclude spaces, either use [^ ]+ or .+?.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mats Wichmann@21:1/5 to Chris Angelico on Thu Mar 2 12:37:30 2023
    On 3/2/23 12:28, Chris Angelico wrote:
    On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <[email protected]> wrote:

    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.


    Not a bug. Find the longest possible match that fits this; as long as
    you can find a space immediately after it, everything in between goes
    into the .+ part.

    If you want to exclude spaces, either use [^ ]+ or .+?.


    https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jose isaias cabrera@21:1/5 to [email protected] on Thu Mar 2 14:38:27 2023
    On Thu, Mar 2, 2023 at 2:32 PM <[email protected]> wrote:

    On 2023-03-02 at 14:22:41 -0500,
    jose isaias cabrera <[email protected]> wrote:

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.

    The bug is in your regular expression; the plus modifier is greedy.

    If you want to match up to the first space, then you'll need something
    like [^ ] (i.e., everything that isn't a space) instead of that dot.

    Thanks. I appreciate your wisdom.

    josé
    --

    What if eternity is real? Where will you spend it? Hmmmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Thu Mar 2 17:40:36 2023
    José,

    Matching can be greedy. Did it match to the last space?

    What you want is a pattern that matches anything except a space (or whitespace) followed b matching a space or something similar.

    Or use a construct that makes matching non-greedy.

    Avi

    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=[email protected]> On Behalf Of jose isaias cabrera
    Sent: Thursday, March 2, 2023 2:23 PM
    To: [email protected]
    Subject: Regular Expression bug?

    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks.

    josé

    --

    What if eternity is real? Where will you spend it? Hmmmm...
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jose isaias cabrera@21:1/5 to [email protected] on Thu Mar 2 20:06:59 2023
    On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <[email protected]> wrote:

    On 3/2/23 12:28, Chris Angelico wrote:
    On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <[email protected]>
    wrote:

    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.


    Not a bug. Find the longest possible match that fits this; as long as
    you can find a space immediately after it, everything in between goes
    into the .+ part.

    If you want to exclude spaces, either use [^ ]+ or .+?.

    https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

    This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)
    print(s0)
    None

    Any help is appreciated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to [email protected] on Thu Mar 2 20:35:12 2023
    It is a well-known fact, Jose, that GIGO.

    The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.


    s = "pn=jose pn=2017"
    ...
    s0 = r0.match(s)
    s0
    <re.Match object; span=(0, 15), match='pn=jose pn=2017'>



    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=[email protected]> On Behalf Of jose isaias cabrera
    Sent: Thursday, March 2, 2023 8:07 PM
    To: Mats Wichmann <[email protected]>
    Cc: [email protected]
    Subject: Re: Regular Expression bug?

    On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <[email protected]> wrote:

    On 3/2/23 12:28, Chris Angelico wrote:
    On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <[email protected]>
    wrote:

    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.


    Not a bug. Find the longest possible match that fits this; as long as
    you can find a space immediately after it, everything in between goes
    into the .+ part.

    If you want to exclude spaces, either use [^ ]+ or .+?.

    https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

    This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)
    print(s0)
    None

    Any help is appreciated.
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to jose isaias cabrera on Fri Mar 3 12:30:45 2023
    On 02Mar2023 20:06, jose isaias cabrera <[email protected]> wrote:
    This re is a bit different than the one I am used. So, I am trying to
    match
    everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)

    `match()` matches at the start of the string. You want r0.search(s).
    - Cameron Simpson <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alan Bawden@21:1/5 to All on Thu Mar 2 20:48:52 2023
    jose isaias cabrera <[email protected]> writes:

    On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <[email protected]> wrote:

    This re is a bit different than the one I am used. So, I am trying to match
    everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)
    >>> print(s0)
    None

    Assuming that you were expecting to match "pn=2017", then you probably
    don't want the 'match' method. Read its documentation. Then read the documentation for the _other_ methods that a Pattern supports. Then you
    will be enlightened.

    - Alan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jose isaias cabrera@21:1/5 to [email protected] on Thu Mar 2 22:35:13 2023
    On Thu, Mar 2, 2023 at 8:35 PM <[email protected]> wrote:

    It is a well-known fact, Jose, that GIGO.

    The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.

    It is not GIGO. pm=project manager. pn=project name. I needed search()
    rather than match().


    s = "pn=jose pn=2017"
    ...
    s0 = r0.match(s)
    s0
    <re.Match object; span=(0, 15), match='pn=jose pn=2017'>



    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=[email protected]> On Behalf Of jose isaias cabrera
    Sent: Thursday, March 2, 2023 8:07 PM
    To: Mats Wichmann <[email protected]>
    Cc: [email protected]
    Subject: Re: Regular Expression bug?

    On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <[email protected]> wrote:

    On 3/2/23 12:28, Chris Angelico wrote:
    On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <[email protected]>
    wrote:

    Greetings.

    For the RegExp Gurus, consider the following python3 code:
    <code>
    import re
    s = "pn=align upgrade sd=2023-02-"
    ro = re.compile(r"pn=(.+) ")
    r0=ro.match(s)
    print(r0.group(1))
    align upgrade
    </code>

    This is wrong. It should be 'align' because the group only goes up-to
    the space. Thoughts? Thanks.


    Not a bug. Find the longest possible match that fits this; as long as
    you can find a space immediately after it, everything in between goes into the .+ part.

    If you want to exclude spaces, either use [^ ]+ or .+?.

    https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

    This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)
    print(s0)
    None

    Any help is appreciated.
    --
    https://mail.python.org/mailman/listinfo/python-list



    --

    What if eternity is real? Where will you spend it? Hmmmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jose isaias cabrera@21:1/5 to [email protected] on Thu Mar 2 22:27:13 2023
    On Thu, Mar 2, 2023 at 8:30 PM Cameron Simpson <[email protected]> wrote:

    On 02Mar2023 20:06, jose isaias cabrera <[email protected]> wrote:
    This re is a bit different than the one I am used. So, I am trying to
    match
    everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)

    `match()` matches at the start of the string. You want r0.search(s).
    - Cameron Simpson <[email protected]>

    Thanks. Darn it! I knew it was something simple.


    --

    What if eternity is real? Where will you spend it? Hmmmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jose isaias cabrera@21:1/5 to [email protected] on Thu Mar 2 22:36:52 2023
    On Thu, Mar 2, 2023 at 9:56 PM Alan Bawden <[email protected]> wrote:

    jose isaias cabrera <[email protected]> writes:

    On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <[email protected]> wrote:

    This re is a bit different than the one I am used. So, I am trying to match
    everything after 'pn=':

    import re
    s = "pm=jose pn=2017"
    m0 = r"pn=(.+)"
    r0 = re.compile(m0)
    s0 = r0.match(s)
    >>> print(s0)
    None

    Assuming that you were expecting to match "pn=2017", then you probably
    don't want the 'match' method. Read its documentation. Then read the documentation for the _other_ methods that a Pattern supports. Then you
    will be enlightened.

    Yes. I need search. Thanks.

    --

    What if eternity is real? Where will you spend it? Hmmmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)