• while read -r line ; do problem

    From Bit Twister@21:1/5 to All on Fri Mar 4 16:44:15 2022
    while read -r line ; do problem

    $ bash --version
    GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

    I have a bash script which reads a script file and updates variables.
    contents of some lines are modified without my script intervention.

    Code snippet
    1 while read -r line; do
    2 _t=$line
    3 set -- $(IFS='=' ; echo $_t)
    4 _wd=$1
    5 case "$_wd" in
    6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;;
    7 <big if/case snip none of which modify _medicare line>
    8 echo $line >> $_tmp_fn
    9 done < $_taxes_paid_fn

    If you look at the following results from set -vx
    You'll notice the _medicare line * was converted to file names used in the script

    read -r line
    _t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
    : IFS==
    echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
    set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
    '[' 13 -gt 1 ']'
    _wd=_medicare

    Here is the
    echo $line >> $_tmp_fn
    which did/has the * jumk/substitution

    echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

    How can I prevent the * substitution and still be use the line modification
    like line 6 in the example snippet??

    Thanks in advance for any advice.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Bit Twister on Sat Mar 5 01:23:29 2022
    On 04.03.2022 23:44, Bit Twister wrote:
    while read -r line ; do problem

    $ bash --version
    GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

    I have a bash script which reads a script file and updates variables. contents of some lines are modified without my script intervention.

    Code snippet
    1 while read -r line; do
    2 _t=$line
    3 set -- $(IFS='=' ; echo $_t)
    4 _wd=$1
    5 case "$_wd" in
    6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;;
    7 <big if/case snip none of which modify _medicare line>
    8 echo $line >> $_tmp_fn
    9 done < $_taxes_paid_fn

    If you look at the following results from set -vx
    You'll notice the _medicare line * was converted to file names used in the script

    read -r line
    _t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
    : IFS==
    echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
    set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
    '[' 13 -gt 1 ']'
    _wd=_medicare

    Here is the
    echo $line >> $_tmp_fn
    which did/has the * jumk/substitution

    echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

    How can I prevent the * substitution and still be use the line modification
    like line 6 in the example snippet??

    If you quote your variables on expansion ("$var") the * as part of your variable value will not expand to file names.

    Janis


    Thanks in advance for any advice.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Bit Twister on Fri Mar 4 18:49:33 2022
    On 3/4/2022 4:44 PM, Bit Twister wrote:
    while read -r line ; do problem

    $ bash --version
    GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

    I have a bash script which reads a script file and updates variables. contents of some lines are modified without my script intervention.

    Code snippet
    1 while read -r line; do
    2 _t=$line
    3 set -- $(IFS='=' ; echo $_t)
    4 _wd=$1
    5 case "$_wd" in
    6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;;
    7 <big if/case snip none of which modify _medicare line>
    8 echo $line >> $_tmp_fn
    9 done < $_taxes_paid_fn

    If you look at the following results from set -vx
    You'll notice the _medicare line * was converted to file names used in the script

    read -r line
    _t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
    : IFS==
    echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
    set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
    '[' 13 -gt 1 ']'
    _wd=_medicare

    Here is the
    echo $line >> $_tmp_fn
    which did/has the * jumk/substitution

    echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

    How can I prevent the * substitution and still be use the line modification
    like line 6 in the example snippet??

    Thanks in advance for any advice.



    1) Always quote your shell variables unless you have an explicit reason
    not to, see https://mywiki.wooledge.org/Quotes.

    2) If you're going to use a shell read loop then always use both `IFS=`
    and `-r`:

    while IFS= read -r line

    unless you have an explicit reason not to, see https://mywiki.wooledge.org/BashFAQ/001.

    3) Don't use a shell loop just to manipulate text as you seem to be
    doing, see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

    Regards,

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bit Twister@21:1/5 to Janis Papanagnou on Fri Mar 4 19:09:27 2022
    On Sat, 5 Mar 2022 01:23:29 +0100, Janis Papanagnou wrote:
    On 04.03.2022 23:44, Bit Twister wrote:
    while read -r line ; do problem

    $ bash --version
    GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

    I have a bash script which reads a script file and updates variables.
    contents of some lines are modified without my script intervention.

    Code snippet
    1 while read -r line; do
    2 _t=$line
    3 set -- $(IFS='=' ; echo $_t)
    4 _wd=$1
    5 case "$_wd" in
    6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;; >> 7 <big if/case snip none of which modify _medicare line>
    8 echo $line >> $_tmp_fn
    9 done < $_taxes_paid_fn

    If you look at the following results from set -vx
    You'll notice the _medicare line * was converted to file names used in the script

    read -r line
    _t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
    : IFS==
    echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
    set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
    '[' 13 -gt 1 ']'
    _wd=_medicare

    Here is the
    echo $line >> $_tmp_fn
    which did/has the * jumk/substitution

    echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

    How can I prevent the * substitution and still be use the line modification >> like line 6 in the example snippet??

    If you quote your variables on expansion ("$var") the * as part of your variable value will not expand to file names.

    Janis

    echo "$line" >> $_tmp_fn
    was/is the solution.fix.


    --
    The warranty and liability expired as you read this message.
    If the above breaks your system, it's yours and you keep both pieces.
    Practice safe computing. Backup the file before you change it.
    Do a, man command_here or cat command_here, before using it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sivaram Neelakantan@21:1/5 to Ed Morton on Sun Mar 6 22:05:54 2022
    On Fri, Mar 04 2022,Ed Morton wrote:


    [snipped 37 lines]


    1) Always quote your shell variables unless you have an explicit reason
    not to, see https://mywiki.wooledge.org/Quotes.

    2) If you're going to use a shell read loop then always use both `IFS=`
    and `-r`:

    while IFS= read -r line

    unless you have an explicit reason not to, see https://mywiki.wooledge.org/BashFAQ/001.

    3) Don't use a shell loop just to manipulate text as you seem to be
    doing, see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

    [snipped 6 lines]

    what then, would be a better way to use the shell for line by line
    processing? The stackexchange answer clearly says, people are
    mimicking C lang style and other issues, which I agree with. What
    should a novice do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc which sort of makes them do all this.


    sivaram
    --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Sivaram Neelakantan on Sun Mar 6 18:56:50 2022
    On 06.03.2022 17:35, Sivaram Neelakantan wrote:
    On Fri, Mar 04 2022,Ed Morton wrote:

    3) Don't use a shell loop just to manipulate text as you seem to be
    doing, see
    https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

    what then, would be a better way to use the shell for line by line processing? The stackexchange answer clearly says, people are
    mimicking C lang style and other issues, which I agree with. What
    should a novice do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc which sort of makes them do all this.

    If a novice wants to manipulate data files I'd suggest to not use
    the shell "hammer" but an appropriate tool for data manipulation,
    e.g. awk. (I agree that a novice not knowing all the Unix tools
    will have a harder job learning them - or rather, getting to know
    about their existence in the first place -, but awk is simple to
    learn and makes a lot of the Unix tools just unnecessary.) In case
    of having to do a lot of typical shell process handling based on
    that data there's also the possibility to transform the data and
    build the data-specific shell commands in awk and pipe it to 'sh'
    awk '...create data-based shell commands...' data-file | sh
    If the data is actually just within a few hundreds (or even a few
    thousands) of lines I also wouldn't care much using a shell. That
    depends on the data, its transformation, and application, though.

    Janis


    sivaram


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Sun Mar 6 19:31:48 2022
    On 06.03.2022 18:56, Janis Papanagnou wrote:

    If the data is actually just within a few hundreds (or even a few
    thousands) of lines I also wouldn't care much using a shell. That
    depends on the data, its transformation, and application, though.

    Oops - I think the semantics of this was wrongly formulated. I meant:

    If the data is actually just within a few hundreds (or even a few
    thousands) of lines I also wouldn't care much _and use_ a shell.
    [and of course, if appropriate]

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Sivaram Neelakantan on Sun Mar 6 13:32:34 2022
    On 3/6/2022 10:35 AM, Sivaram Neelakantan wrote:
    On Fri, Mar 04 2022,Ed Morton wrote:


    [snipped 37 lines]


    1) Always quote your shell variables unless you have an explicit reason
    not to, see https://mywiki.wooledge.org/Quotes.

    2) If you're going to use a shell read loop then always use both `IFS=`
    and `-r`:

    while IFS= read -r line

    unless you have an explicit reason not to, see
    https://mywiki.wooledge.org/BashFAQ/001.

    3) Don't use a shell loop just to manipulate text as you seem to be
    doing, see
    https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

    [snipped 6 lines]

    what then, would be a better way to use the shell for line by line processing? The stackexchange answer clearly says, people are
    mimicking C lang style and other issues, which I agree with. What
    should a novice do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc which sort of makes them do all this.

    People shouldn't be writing shell scripts unless they do know about the
    most common mandatory POSIX tools though. Doing so would be like trying
    to build a house when all you know how to use is a toolbelt and you've
    never heard of a hammer/screwdriver/saw/drill etc that the toolbelt is
    designed to hold.

    In general if you want to do small, simple operations then use tools
    like sed, grep, cut, etc. but if you find yourself creating lengthy
    and/or complicated pipelines of those or being tempted to write a shell
    loop to process multi-line text then you should be using awk instead.

    Again - the above is about manipulating text. If you find yourself
    needing to manipulate (create/destroy) files or processes THEN a shell
    loop may be appropriate (if xargs isn't a better solution).

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sivaram Neelakantan@21:1/5 to Ed Morton on Mon Mar 7 16:39:57 2022
    On Sun, Mar 06 2022,Ed Morton wrote:


    [snipped 26 lines]


    People shouldn't be writing shell scripts unless they do know about the
    most common mandatory POSIX tools though. Doing so would be like trying
    to build a house when all you know how to use is a toolbelt and you've
    never heard of a hammer/screwdriver/saw/drill etc that the toolbelt is designed to hold.

    On that standard, no one would ever get started on shell then, would
    they? Me, I started with Pike's UPE on a linux bash shell till I saw
    people getting politely chewed out for not being posixy/portable in
    c.u.shell. And I'm not the only clown in this circus. And no, I
    haven't seen/read one posix doc, though I have seen it being quoted
    here.


    In general if you want to do small, simple operations then use tools
    like sed, grep, cut, etc. but if you find yourself creating lengthy
    and/or complicated pipelines of those or being tempted to write a shell
    loop to process multi-line text then you should be using awk instead.


    As a low level sysadmin thrown in the deep end of a bog standard prod
    support project decades ago, I have seen 5/10/15 yr scripts with the
    above abused paradigm. I didn't touch or change it nor did the
    retiring AT&T/Sprint/H3G/others chap who handed it over to me.
    Unfortunately I used to use the template because it's been working for
    so long. Talk about picking the one idea of the 1000s of shell script
    which was bad. :-)

    Again - the above is about manipulating text. If you find yourself
    needing to manipulate (create/destroy) files or processes THEN a shell
    loop may be appropriate (if xargs isn't a better solution).


    I suspect that with no one telling what's the best or optimal way to
    save tears down the road, it's just like the mess you described. It's
    good thing that my mistakes are generally not earth altering....so
    far.

    sivaram
    --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Sivaram Neelakantan on Mon Mar 7 06:56:49 2022
    On 3/7/2022 5:09 AM, Sivaram Neelakantan wrote:
    On Sun, Mar 06 2022,Ed Morton wrote:


    [snipped 26 lines]


    People shouldn't be writing shell scripts unless they do know about the
    most common mandatory POSIX tools though. Doing so would be like trying
    to build a house when all you know how to use is a toolbelt and you've
    never heard of a hammer/screwdriver/saw/drill etc that the toolbelt is
    designed to hold.

    On that standard, no one would ever get started on shell then, would
    they?

    You're suggesting people are out there learning to write:

    while IFS= read -r line; do
    if [[ $line =~ foo ]]; then
    echo "$line"
    fi
    done < file

    before they've learned:

    grep 'foo' file

    I don't buy it.

    Me, I started with Pike's UPE on a linux bash shell till I saw
    people getting politely chewed out for not being posixy/portable in c.u.shell. And I'm not the only clown in this circus. And no, I
    haven't seen/read one posix doc, though I have seen it being quoted
    here.

    I'm not saying you need to read POSIX docs, I'm saying you need to have
    heard of the most common tools that are required by POSIX to exist on
    all Unix systems.

    It's been 40+ years since I learned about Unix but as I recall the
    starting point was "here's how to find text in a file" followed by grep
    and ditto for when/how to call sed, cut, head, tail, etc. It was only
    later we learned how to write shell scripts to glue them together and I
    can't imagine how it'd make any sense to learn how to write the glue first.

    Ed.



    In general if you want to do small, simple operations then use tools
    like sed, grep, cut, etc. but if you find yourself creating lengthy
    and/or complicated pipelines of those or being tempted to write a shell
    loop to process multi-line text then you should be using awk instead.


    As a low level sysadmin thrown in the deep end of a bog standard prod
    support project decades ago, I have seen 5/10/15 yr scripts with the
    above abused paradigm. I didn't touch or change it nor did the
    retiring AT&T/Sprint/H3G/others chap who handed it over to me.
    Unfortunately I used to use the template because it's been working for
    so long. Talk about picking the one idea of the 1000s of shell script
    which was bad. :-)

    Again - the above is about manipulating text. If you find yourself
    needing to manipulate (create/destroy) files or processes THEN a shell
    loop may be appropriate (if xargs isn't a better solution).


    I suspect that with no one telling what's the best or optimal way to
    save tears down the road, it's just like the mess you described. It's
    good thing that my mistakes are generally not earth altering....so
    far.

    sivaram

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to [email protected] on Tue Mar 8 16:40:55 2022
    In article <[email protected]>,
    Sivaram Neelakantan <[email protected]> wrote:
    ...
    what then, would be a better way to use the shell for line by line >processing? The stackexchange answer clearly says, people are mimicking
    C lang style and other issues, which I agree with. What should a novice
    do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc
    which sort of makes them do all this.

    I usually use MAPFILE (in bash) for this. MAPFILE reads an entire file or process into an array. Then you can iterate the array. So, you end up
    with:

    mapfile -t < file
    for i in "${MAPFILE[@]}"
    do
    ...
    done

    Or, to do it with a process (the more common case):

    mapfile -t < <(process)
    for i in "${MAPFILE[@]}"
    do
    ...
    done

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/FiftyPercent

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Tue Mar 8 18:39:13 2022
    On 08.03.2022 17:40, Kenny McCormack wrote:
    In article <[email protected]>,
    Sivaram Neelakantan <[email protected]> wrote:
    ...
    what then, would be a better way to use the shell for line by line
    processing? The stackexchange answer clearly says, people are mimicking
    C lang style and other issues, which I agree with. What should a novice
    do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc
    which sort of makes them do all this.

    I usually use MAPFILE (in bash) for this. MAPFILE reads an entire file or process into an array. Then you can iterate the array. So, you end up
    with:

    mapfile -t < file
    for i in "${MAPFILE[@]}"
    do
    ...
    done


    Nice bash feature, didn't knew it.

    In other shell (like the ksh I use) I'd have to do something like

    IFS=$'\n' MAPFILE=( $(< mapfile-data) )

    to populate an array.

    Janis

    Or, to do it with a process (the more common case):

    mapfile -t < <(process)
    for i in "${MAPFILE[@]}"
    do
    ...
    done


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From William Ahern@21:1/5 to Ed Morton on Tue Mar 8 18:46:18 2022
    Ed Morton <[email protected]> wrote:
    On 3/4/2022 4:44 PM, Bit Twister wrote:
    while read -r line ; do problem

    $ bash --version
    GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

    I have a bash script which reads a script file and updates variables.
    contents of some lines are modified without my script intervention.

    Code snippet
    1 while read -r line; do
    2 _t=$line
    3 set -- $(IFS='=' ; echo $_t)
    4 _wd=$1
    5 case "$_wd" in
    6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;; >> 7 <big if/case snip none of which modify _medicare line>
    8 echo $line >> $_tmp_fn
    9 done < $_taxes_paid_fn
    <snip>
    3) Don't use a shell loop just to manipulate text as you seem to be
    doing, see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.


    IMO, this is not great advice.

    1) If raw throughput matters, you shouldn't be using shell text processing
    in the first place; use sed or awk, at the very least, instead.

    2) If the input is coming from a pipe, what the read loops buys you is concurrency and parallelism. If the process generating the input has high latency, the concurrency can help tremendously. If either side uses alot of CPU, the parallelism might help performance, overcoming the byte-by-byte
    issue.

    Case example: last year I downloaded a company engineer-managed script that updated routing tables, created as a workaround for a poorly managed IPSec
    VPN configuration deployed on company laptos. When I ran the script it
    seemed to hang, so I'd kill it and run it again. After a few minutes I
    decided to dive into the script to figure out what was happening. The fundamental problem was that they had one routine generating a list of addresses and another routine consuming the list in a loop. Crucially, the latter, second routine was using the Bash'ism to slurp the input into an
    array for processing using a for-loop rather than a while-read-loop. The address-generating routine was doing network I/O to download and preprocess
    the lists, which was taking considerable time. Meanwhile, the second loop
    was completely idle waiting for the first to finish. The second loop also incurred some surprisingly high latency per address (IIRC, might have been every invocation of route(1) doing reverse DNS or some such). Long story
    short, the entire script took much longer to complete than if they had used
    a simple while-read-loop, permitting both loops to run concurrently. Plus,
    the script would have provided immediate feedback that things were actually progressing.

    You see something similar with the widespread adoption of map-filter-reduce functional patterns in languages like JavaScript. The current popularity
    seems to have been kicked off by admiration for Haskell-style algorithms,
    which once upon a time were popular blog fodder. But Haskell uses lazy list evaluation, unlike languages like JavaScript. The result is that the new preferred pattern results in a tremendous amount of memory usage and churn,
    as every transformation step requires constructing and populating a whole
    new array. It makes for some horribly inefficient programs; inefficient in a quite opaque way, whereas with traditional patterns the unnecessary array duplications would be immediately obvious, particularly if reading the code with an eye toward improving performance. (Also aren't creating a bunch of closures, which can create barriers to JIT optimization.)

    Some of the old patterns--e.g. shell pipes--are far more sophisticated than people give them credit for today. See, e.g., this 2014 paper by Doug
    McIlroy, inventor of the Unix pipe, describing the equivalency between coroutines, pipes, and lazy lists:

    https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sivaram Neelakantan@21:1/5 to Kenny McCormack on Wed Mar 9 08:16:51 2022
    On Tue, Mar 08 2022,Kenny McCormack wrote:


    [snipped 9 lines]

    I usually use MAPFILE (in bash) for this. MAPFILE reads an entire file or process into an array. Then you can iterate the array. So, you end up
    with:

    mapfile -t < file
    for i in "${MAPFILE[@]}"
    do
    ...
    done

    Or, to do it with a process (the more common case):

    mapfile -t < <(process)
    for i in "${MAPFILE[@]}"
    do
    ...
    done

    Thanks for this, this is news to me.

    sivaram
    --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to William Ahern on Wed Mar 9 06:39:02 2022
    On 3/8/2022 8:46 PM, William Ahern wrote:
    Ed Morton <[email protected]> wrote:
    On 3/4/2022 4:44 PM, Bit Twister wrote:
    while read -r line ; do problem

    $ bash --version
    GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

    I have a bash script which reads a script file and updates variables.
    contents of some lines are modified without my script intervention.

    Code snippet
    1 while read -r line; do
    2 _t=$line
    3 set -- $(IFS='=' ; echo $_t)
    4 _wd=$1
    5 case "$_wd" in
    6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;; >>> 7 <big if/case snip none of which modify _medicare line>
    8 echo $line >> $_tmp_fn
    9 done < $_taxes_paid_fn
    <snip>
    3) Don't use a shell loop just to manipulate text as you seem to be
    doing, see
    https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.


    IMO, this is not great advice.

    1) If raw throughput matters, you shouldn't be using shell text processing
    in the first place; use sed or awk, at the very least, instead.

    That's the same advice.

    I'm not sure what you're saying below. It sounds like you're discussing
    some bad software you came across that you improved by replacing a
    couple of for loops with a while loop but obviously that doesn't mean it couldn't have been further improved by using, say, awk instead. Can you
    provide a concise sample shell script that clearly and simply just
    demonstrates what you're describing below and some way of generating
    sample input to help us understand what you're describing and so we can
    test it?

    Ed.


    2) If the input is coming from a pipe, what the read loops buys you is concurrency and parallelism. If the process generating the input has high latency, the concurrency can help tremendously. If either side uses alot of CPU, the parallelism might help performance, overcoming the byte-by-byte issue.

    Case example: last year I downloaded a company engineer-managed script that updated routing tables, created as a workaround for a poorly managed IPSec VPN configuration deployed on company laptos. When I ran the script it
    seemed to hang, so I'd kill it and run it again. After a few minutes I decided to dive into the script to figure out what was happening. The fundamental problem was that they had one routine generating a list of addresses and another routine consuming the list in a loop. Crucially, the latter, second routine was using the Bash'ism to slurp the input into an array for processing using a for-loop rather than a while-read-loop. The address-generating routine was doing network I/O to download and preprocess the lists, which was taking considerable time. Meanwhile, the second loop
    was completely idle waiting for the first to finish. The second loop also incurred some surprisingly high latency per address (IIRC, might have been every invocation of route(1) doing reverse DNS or some such). Long story short, the entire script took much longer to complete than if they had used
    a simple while-read-loop, permitting both loops to run concurrently. Plus, the script would have provided immediate feedback that things were actually progressing.

    You see something similar with the widespread adoption of map-filter-reduce functional patterns in languages like JavaScript. The current popularity seems to have been kicked off by admiration for Haskell-style algorithms, which once upon a time were popular blog fodder. But Haskell uses lazy list evaluation, unlike languages like JavaScript. The result is that the new preferred pattern results in a tremendous amount of memory usage and churn, as every transformation step requires constructing and populating a whole
    new array. It makes for some horribly inefficient programs; inefficient in a quite opaque way, whereas with traditional patterns the unnecessary array duplications would be immediately obvious, particularly if reading the code with an eye toward improving performance. (Also aren't creating a bunch of closures, which can create barriers to JIT optimization.)

    Some of the old patterns--e.g. shell pipes--are far more sophisticated than people give them credit for today. See, e.g., this 2014 paper by Doug McIlroy, inventor of the Unix pipe, describing the equivalency between coroutines, pipes, and lazy lists:

    https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Wed Mar 9 22:38:43 2022
    On 09.03.2022 13:39, Ed Morton wrote:
    On 3/8/2022 8:46 PM, William Ahern wrote:

    I'm not sure what you're saying below. It sounds like you're discussing
    some bad software you came across that you improved by replacing a
    couple of for loops with a while loop but obviously that doesn't mean it couldn't have been further improved by using, say, awk instead. Can you provide a concise sample shell script that clearly and simply just demonstrates what you're describing below and some way of generating
    sample input to help us understand what you're describing and so we can
    test it?

    What I had associated with the described text was...

    1) for loop - that reads completely constructed data - gets replaced
    by sequential and parallelisable processing pipe, e.g.

    for f in `ls` # often seen [bad] pattern
    for f in * # implicitly also sorting

    (note: the "lazy evaluation" concept that the poster mentioned, if
    implemented in shell, could probably handle both cases as well)

    vs.

    ls | while read

    That example is certainly not accurate describing the intention since
    'ls' is implicitly also sorting and requires to store all elements.
    So replace 'ls' by 'some_arbitrary_non_buffering_data_generator'.

    2)
    Then the hypothesis that the slow (character-wise read) 'while read'
    could be negligible [in certain cases] if the left-hand-side process
    requires more execution time than the character-wise read.
    So replace 'ls' by 'some_arbitrary_non_buffering_slow_data_generator'.

    3)
    Many processes can be parallelised on modern systems (scheduling on
    multi-core or multi-CPU systems), so

    x | y | z # may run in parallel

    (note: the individual processes may also slow down the pipe when
    storing huge amounts of data; e.g. in cases like using 'sort')

    vs.

    xyz # monolithic tool (hypothesis: non-parallelisable)

    (note: the latter presumes that the processing steps have to be
    or are implemented in a linear, sequential way in 'xyz')

    4)
    Finally co-processes are mentioned (not directly related) but directly supported by the modern powerful shells (or use external Unix tools).

    This is my interpretation of the text. (The author will correct any misunderstandings, I hope.) Note: I am just interpreting, not valuing
    what has been said (or what I think had been said).

    Janis


    Ed.


    2) If the input is coming from a pipe, what the read loops buys you is
    concurrency and parallelism. If the process generating the input has high
    latency, the concurrency can help tremendously. If either side uses
    alot of
    CPU, the parallelism might help performance, overcoming the byte-by-byte
    issue.

    Case example: last year I downloaded a company engineer-managed script
    that
    updated routing tables, created as a workaround for a poorly managed
    IPSec
    VPN configuration deployed on company laptos. When I ran the script it
    seemed to hang, so I'd kill it and run it again. After a few minutes I
    decided to dive into the script to figure out what was happening. The
    fundamental problem was that they had one routine generating a list of
    addresses and another routine consuming the list in a loop. Crucially,
    the
    latter, second routine was using the Bash'ism to slurp the input into an
    array for processing using a for-loop rather than a while-read-loop. The
    address-generating routine was doing network I/O to download and
    preprocess
    the lists, which was taking considerable time. Meanwhile, the second loop
    was completely idle waiting for the first to finish. The second loop also
    incurred some surprisingly high latency per address (IIRC, might have
    been
    every invocation of route(1) doing reverse DNS or some such). Long story
    short, the entire script took much longer to complete than if they had
    used
    a simple while-read-loop, permitting both loops to run concurrently.
    Plus,
    the script would have provided immediate feedback that things were
    actually
    progressing.

    You see something similar with the widespread adoption of
    map-filter-reduce
    functional patterns in languages like JavaScript. The current popularity
    seems to have been kicked off by admiration for Haskell-style algorithms,
    which once upon a time were popular blog fodder. But Haskell uses lazy
    list
    evaluation, unlike languages like JavaScript. The result is that the new
    preferred pattern results in a tremendous amount of memory usage and
    churn,
    as every transformation step requires constructing and populating a whole
    new array. It makes for some horribly inefficient programs;
    inefficient in a
    quite opaque way, whereas with traditional patterns the unnecessary array
    duplications would be immediately obvious, particularly if reading the
    code
    with an eye toward improving performance. (Also aren't creating a
    bunch of
    closures, which can create barriers to JIT optimization.)

    Some of the old patterns--e.g. shell pipes--are far more sophisticated
    than
    people give them credit for today. See, e.g., this 2014 paper by Doug
    McIlroy, inventor of the Unix pipe, describing the equivalency between
    coroutines, pipes, and lazy lists:

    https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)