• regsub replacement question

    From aotto1968@21:1/5 to All on Fri Mar 22 08:29:20 2024
    Hi,

    # I have a question regarding *regsub* and how to accelerate replacement
    # let's assume the following code:

    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    # My goal is to eliminate the all "123" except the FIRST one with the restriction
    # that between the "123" is *not* a number other then 123
    puts [regsub -all {(\d+)([^\d]*)\1} $str {\1\2}]
    aaa123bbbccc123dddeee123fffggg

    # → my problem is that always the SECOND "111" is replaced because the replacement itself is *not*
    # checked again.

    # my solution is a loop
    while {[regsub -all {(\d+)([^\d]*)\1} $str {\1\2} str]} ""

    # this works but the GOAL is to have ONE *regsub* to get this job done
    puts $str
    aaa123bbbcccdddeeefffggg


    mfg ao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Leitgeb@21:1/5 to [email protected] on Fri Mar 22 08:04:48 2024
    aotto1968 <[email protected]> wrote:
    # I have a question regarding *regsub* and how to accelerate replacement
    # let's assume the following code:
    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    # My goal is to eliminate the all "123" except the FIRST one with the restriction
    # that between the "123" is *not* a number other then 123
    puts [regsub -all {(\d+)([^\d]*)\1} $str {\1\2}]
    aaa123bbbccc123dddeee123fffggg

    # → my problem is that always the SECOND "111" is replaced because the replacement itself is *not*
    # checked again.

    That is correct, after first substitution of "123bbb123" to "123bbb",
    then in the remainder it doesn't see the "ccc" wrapped in "123"s, so
    cannot eliminate the trailing "123" for "ccc"

    # my solution is a loop
    while {[regsub -all {(\d+)([^\d]*)\1} $str {\1\2} str]} ""

    I think this is the way to go, but you might experiment with
    removing the "-all" option... Maybe it improves speed, or
    maybe it spoils it, I can't predict.

    # this works but the GOAL is to have ONE *regsub* to get this job done
    puts $str
    aaa123bbbcccdddeeefffggg

    Another approach could be to extract the non-"123"s as a list
    with regexp (not regsub), and then just re-insert the number:

    set num [regexp -inline {\d+} $str];# get the separating number
    set list [regexp -inline -all {\D+} $str] ;# \D is like [^\d]
    puts [join [linsert $list 1 $num] ""]

    (unless you also need to deal with aaa123bbb123ccc456ddd456 where
    the first 456 also needs to stay...)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Fri Mar 22 12:05:35 2024
    * aotto1968 <[email protected]>
    | set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    | # My goal is to eliminate the all "123" except the FIRST one with the
    | # restriction that between the "123" is *not* a number other then 123

    If that *really* is the goal, I would simply search for the first "123"
    and then [string map] the rest of them to "":

    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"
    set res [string range $str 0 [string first 123 $str]+2]
    append res [string map {123 ""} [string range $str [string first 123 $str ]+3 end]]

    I do not completely understand the second part of the restriction, though...

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to Ralf Fassel on Sat Mar 23 10:00:16 2024
    On 22.03.24 12:05, Ralf Fassel wrote:
    * aotto1968 <[email protected]>
    | set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    | # My goal is to eliminate the all "123" except the FIRST one with the
    | # restriction that between the "123" is *not* a number other then 123

    If that *really* is the goal, I would simply search for the first "123"
    and then [string map] the rest of them to "":

    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"
    set res [string range $str 0 [string first 123 $str]+2]
    append res [string map {123 ""} [string range $str [string first 123 $str ]+3 end]]

    I do not completely understand the second part of the restriction, though...

    R'

    as always the real-problem is much more complicated as the easy example above.

    My try is to get a solution without recall the *regsub* multiple times. To achieve this
    the *regsub* has to re-scan the substitution as part of the "-all" switch. The *regsub* has with the '-start' switch already implemented the ability to get this done.

    The "123" is just an example because this question is a "followup" of the:
    https://wiki.tcl-lang.org/page/BUG+%2D+%27string+length%27+count+also+NON+visible+chars
    problem.

    the "123" is in real a the regular-expression:

    \u001b\[[0-9;]*m

    and the

    regsub -all {(\d+)([^\d]*)\1} $str {\1\2}

    is in real:

    # erase CTRL->CTRL doublets
    while {[regsub -all {(\u001b\[[0-9;]*m)([^\u001b]*)\1} $STR {\1\2} STR]} {
    #puts fire0
    }

    .

    The CORE problem of the

    while {[regsub -all ...]} ""

    is that every loop the entire STR is processed and *not* just the part starting with the *last*
    substitution.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)