Forum: >>> Magnum BBS <<<

while read -r line ; do problem

From Bit Twister@21:1/5 to All on Fri Mar 4 16:44:15 2022

while read -r line ; do problem

$ bash --version
GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

I have a bash script which reads a script file and updates variables.
contents of some lines are modified without my script intervention.

Code snippet
1 while read -r line; do
2 _t=$line
3 set -- $(IFS='=' ; echo $_t)
4 _wd=$1
5 case "$_wd" in
6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;;
7 <big if/case snip none of which modify _medicare line>
8 echo $line >> $_tmp_fn
9 done < $_taxes_paid_fn

If you look at the following results from set -vx
You'll notice the _medicare line * was converted to file names used in the script

read -r line
_t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
: IFS==
echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
'[' 13 -gt 1 ']'
_wd=_medicare

Here is the
echo $line >> $_tmp_fn
which did/has the * jumk/substitution

echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

How can I prevent the * substitution and still be use the line modification
like line 6 in the example snippet??

Thanks in advance for any advice.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Bit Twister on Sat Mar 5 01:23:29 2022

On 04.03.2022 23:44, Bit Twister wrote:

while read -r line ; do problem

$ bash --version
GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

I have a bash script which reads a script file and updates variables. contents of some lines are modified without my script intervention.

Code snippet
1 while read -r line; do
2 _t=$line
3 set -- $(IFS='=' ; echo $_t)
4 _wd=$1
5 case "$_wd" in
6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;;
7 <big if/case snip none of which modify _medicare line>
8 echo $line >> $_tmp_fn
9 done < $_taxes_paid_fn

If you look at the following results from set -vx
You'll notice the _medicare line * was converted to file names used in the script

read -r line
_t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
: IFS==
echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
'[' 13 -gt 1 ']'
_wd=_medicare

Here is the
echo $line >> $_tmp_fn
which did/has the * jumk/substitution

echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

How can I prevent the * substitution and still be use the line modification
like line 6 in the example snippet??

If you quote your variables on expansion ("$var") the * as part of your variable value will not expand to file names.

Janis

Thanks in advance for any advice.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Bit Twister on Fri Mar 4 18:49:33 2022

On 3/4/2022 4:44 PM, Bit Twister wrote:

while read -r line ; do problem

$ bash --version
GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

I have a bash script which reads a script file and updates variables. contents of some lines are modified without my script intervention.

Code snippet
1 while read -r line; do
2 _t=$line
3 set -- $(IFS='=' ; echo $_t)
4 _wd=$1
5 case "$_wd" in
6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;;
7 <big if/case snip none of which modify _medicare line>
8 echo $line >> $_tmp_fn
9 done < $_taxes_paid_fn

If you look at the following results from set -vx
You'll notice the _medicare line * was converted to file names used in the script

read -r line
_t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
: IFS==
echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
'[' 13 -gt 1 ']'
_wd=_medicare

Here is the
echo $line >> $_tmp_fn
which did/has the * jumk/substitution

echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

How can I prevent the * substitution and still be use the line modification
like line 6 in the example snippet??

Thanks in advance for any advice.

1) Always quote your shell variables unless you have an explicit reason
not to, see https://mywiki.wooledge.org/Quotes.

2) If you're going to use a shell read loop then always use both `IFS=`
and `-r`:

while IFS= read -r line

unless you have an explicit reason not to, see https://mywiki.wooledge.org/BashFAQ/001.

3) Don't use a shell loop just to manipulate text as you seem to be
doing, see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

Regards,

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bit Twister@21:1/5 to Janis Papanagnou on Fri Mar 4 19:09:27 2022

On Sat, 5 Mar 2022 01:23:29 +0100, Janis Papanagnou wrote:

On 04.03.2022 23:44, Bit Twister wrote:

while read -r line ; do problem

$ bash --version
GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

I have a bash script which reads a script file and updates variables.
contents of some lines are modified without my script intervention.

Code snippet
1 while read -r line; do
2 _t=$line
3 set -- $(IFS='=' ; echo $_t)
4 _wd=$1
5 case "$_wd" in
6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;; >> 7 <big if/case snip none of which modify _medicare line>
8 echo $line >> $_tmp_fn
9 done < $_taxes_paid_fn

If you look at the following results from set -vx
You'll notice the _medicare line * was converted to file names used in the script

read -r line
_t='_medicare="$(echo "scale=2; 144.60 * 12" | bc)"'
: IFS==
echo _medicare '"$(echo "scale' '2; 144.60 * 12" | bc)"'
set -- _medicare '"$(echo' '"scale' '2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'
'[' 13 -gt 1 ']'
_wd=_medicare

Here is the
echo $line >> $_tmp_fn
which did/has the * jumk/substitution

echo '_medicare="$(echo' '"scale=2;' 144.60 202112.txt 2021_es_taxes_paid.txt aa cons_202112.txt uniform_rmd_wksht.pdf '12"' '|' 'bc)"'

How can I prevent the * substitution and still be use the line modification >> like line 6 in the example snippet??

If you quote your variables on expansion ("$var") the * as part of your variable value will not expand to file names.

Janis

echo "$line" >> $_tmp_fn
was/is the solution.fix.

--
The warranty and liability expired as you read this message.
If the above breaks your system, it's yours and you keep both pieces.
Practice safe computing. Backup the file before you change it.
Do a, man command_here or cat command_here, before using it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sivaram Neelakantan@21:1/5 to Ed Morton on Sun Mar 6 22:05:54 2022

On Fri, Mar 04 2022,Ed Morton wrote:

[snipped 37 lines]

1) Always quote your shell variables unless you have an explicit reason
not to, see https://mywiki.wooledge.org/Quotes.

2) If you're going to use a shell read loop then always use both `IFS=`
and `-r`:

while IFS= read -r line

unless you have an explicit reason not to, see https://mywiki.wooledge.org/BashFAQ/001.

3) Don't use a shell loop just to manipulate text as you seem to be
doing, see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

[snipped 6 lines]

what then, would be a better way to use the shell for line by line
processing? The stackexchange answer clearly says, people are
mimicking C lang style and other issues, which I agree with. What
should a novice do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc which sort of makes them do all this.

sivaram
--

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Sivaram Neelakantan on Sun Mar 6 18:56:50 2022

On 06.03.2022 17:35, Sivaram Neelakantan wrote:

On Fri, Mar 04 2022,Ed Morton wrote:

3) Don't use a shell loop just to manipulate text as you seem to be
doing, see
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

what then, would be a better way to use the shell for line by line processing? The stackexchange answer clearly says, people are
mimicking C lang style and other issues, which I agree with. What
should a novice do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc which sort of makes them do all this.

If a novice wants to manipulate data files I'd suggest to not use
the shell "hammer" but an appropriate tool for data manipulation,
e.g. awk. (I agree that a novice not knowing all the Unix tools
will have a harder job learning them - or rather, getting to know
about their existence in the first place -, but awk is simple to
learn and makes a lot of the Unix tools just unnecessary.) In case
of having to do a lot of typical shell process handling based on
that data there's also the possibility to transform the data and
build the data-specific shell commands in awk and pipe it to 'sh'
awk '...create data-based shell commands...' data-file | sh
If the data is actually just within a few hundreds (or even a few
thousands) of lines I also wouldn't care much using a shell. That
depends on the data, its transformation, and application, though.

Janis

sivaram

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Janis Papanagnou on Sun Mar 6 19:31:48 2022

On 06.03.2022 18:56, Janis Papanagnou wrote:

If the data is actually just within a few hundreds (or even a few
thousands) of lines I also wouldn't care much using a shell. That
depends on the data, its transformation, and application, though.

Oops - I think the semantics of this was wrongly formulated. I meant:

If the data is actually just within a few hundreds (or even a few
thousands) of lines I also wouldn't care much _and use_ a shell.
[and of course, if appropriate]

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Sivaram Neelakantan on Sun Mar 6 13:32:34 2022

On 3/6/2022 10:35 AM, Sivaram Neelakantan wrote:

On Fri, Mar 04 2022,Ed Morton wrote:

[snipped 37 lines]

1) Always quote your shell variables unless you have an explicit reason
not to, see https://mywiki.wooledge.org/Quotes.

2) If you're going to use a shell read loop then always use both `IFS=`
and `-r`:

while IFS= read -r line

unless you have an explicit reason not to, see
https://mywiki.wooledge.org/BashFAQ/001.

3) Don't use a shell loop just to manipulate text as you seem to be
doing, see
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

[snipped 6 lines]

what then, would be a better way to use the shell for line by line processing? The stackexchange answer clearly says, people are
mimicking C lang style and other issues, which I agree with. What
should a novice do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc which sort of makes them do all this.

People shouldn't be writing shell scripts unless they do know about the
most common mandatory POSIX tools though. Doing so would be like trying
to build a house when all you know how to use is a toolbelt and you've
never heard of a hammer/screwdriver/saw/drill etc that the toolbelt is
designed to hold.

In general if you want to do small, simple operations then use tools
like sed, grep, cut, etc. but if you find yourself creating lengthy
and/or complicated pipelines of those or being tempted to write a shell
loop to process multi-line text then you should be using awk instead.

Again - the above is about manipulating text. If you find yourself
needing to manipulate (create/destroy) files or processes THEN a shell
loop may be appropriate (if xargs isn't a better solution).

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sivaram Neelakantan@21:1/5 to Ed Morton on Mon Mar 7 16:39:57 2022

On Sun, Mar 06 2022,Ed Morton wrote:

[snipped 26 lines]

People shouldn't be writing shell scripts unless they do know about the
most common mandatory POSIX tools though. Doing so would be like trying
to build a house when all you know how to use is a toolbelt and you've
never heard of a hammer/screwdriver/saw/drill etc that the toolbelt is designed to hold.

On that standard, no one would ever get started on shell then, would
they? Me, I started with Pike's UPE on a linux bash shell till I saw
people getting politely chewed out for not being posixy/portable in
c.u.shell. And I'm not the only clown in this circus. And no, I
haven't seen/read one posix doc, though I have seen it being quoted
here.

In general if you want to do small, simple operations then use tools
like sed, grep, cut, etc. but if you find yourself creating lengthy
and/or complicated pipelines of those or being tempted to write a shell
loop to process multi-line text then you should be using awk instead.

As a low level sysadmin thrown in the deep end of a bog standard prod
support project decades ago, I have seen 5/10/15 yr scripts with the
above abused paradigm. I didn't touch or change it nor did the
retiring AT&T/Sprint/H3G/others chap who handed it over to me.
Unfortunately I used to use the template because it's been working for
so long. Talk about picking the one idea of the 1000s of shell script
which was bad. :-)

Again - the above is about manipulating text. If you find yourself
needing to manipulate (create/destroy) files or processes THEN a shell
loop may be appropriate (if xargs isn't a better solution).

I suspect that with no one telling what's the best or optimal way to
save tears down the road, it's just like the mess you described. It's
good thing that my mistakes are generally not earth altering....so
far.

sivaram
--

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Sivaram Neelakantan on Mon Mar 7 06:56:49 2022

On 3/7/2022 5:09 AM, Sivaram Neelakantan wrote:

On Sun, Mar 06 2022,Ed Morton wrote:

[snipped 26 lines]

People shouldn't be writing shell scripts unless they do know about the
most common mandatory POSIX tools though. Doing so would be like trying
to build a house when all you know how to use is a toolbelt and you've
never heard of a hammer/screwdriver/saw/drill etc that the toolbelt is
designed to hold.

On that standard, no one would ever get started on shell then, would
they?

You're suggesting people are out there learning to write:

while IFS= read -r line; do
if [[ $line =~ foo ]]; then
echo "$line"
fi
done < file

before they've learned:

grep 'foo' file

I don't buy it.

Me, I started with Pike's UPE on a linux bash shell till I saw

people getting politely chewed out for not being posixy/portable in c.u.shell. And I'm not the only clown in this circus. And no, I
haven't seen/read one posix doc, though I have seen it being quoted
here.

I'm not saying you need to read POSIX docs, I'm saying you need to have
heard of the most common tools that are required by POSIX to exist on
all Unix systems.

It's been 40+ years since I learned about Unix but as I recall the
starting point was "here's how to find text in a file" followed by grep
and ditto for when/how to call sed, cut, head, tail, etc. It was only
later we learned how to write shell scripts to glue them together and I
can't imagine how it'd make any sense to learn how to write the glue first.

Ed.

In general if you want to do small, simple operations then use tools
like sed, grep, cut, etc. but if you find yourself creating lengthy
and/or complicated pipelines of those or being tempted to write a shell
loop to process multi-line text then you should be using awk instead.

As a low level sysadmin thrown in the deep end of a bog standard prod
support project decades ago, I have seen 5/10/15 yr scripts with the
above abused paradigm. I didn't touch or change it nor did the
retiring AT&T/Sprint/H3G/others chap who handed it over to me.
Unfortunately I used to use the template because it's been working for
so long. Talk about picking the one idea of the 1000s of shell script
which was bad. :-)

Again - the above is about manipulating text. If you find yourself
needing to manipulate (create/destroy) files or processes THEN a shell
loop may be appropriate (if xargs isn't a better solution).

I suspect that with no one telling what's the best or optimal way to
save tears down the road, it's just like the mess you described. It's
good thing that my mistakes are generally not earth altering....so
far.

sivaram

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to [email protected] on Tue Mar 8 16:40:55 2022

In article <[email protected]>,
Sivaram Neelakantan <[email protected]> wrote:
...

what then, would be a better way to use the shell for line by line >processing? The stackexchange answer clearly says, people are mimicking
C lang style and other issues, which I agree with. What should a novice
do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc
which sort of makes them do all this.

I usually use MAPFILE (in bash) for this. MAPFILE reads an entire file or process into an array. Then you can iterate the array. So, you end up
with:

mapfile -t < file
for i in "${MAPFILE[@]}"
do
...
done

Or, to do it with a process (the more common case):

mapfile -t < <(process)
for i in "${MAPFILE[@]}"
do
...
done

--
The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/FiftyPercent

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Kenny McCormack on Tue Mar 8 18:39:13 2022

On 08.03.2022 17:40, Kenny McCormack wrote:

In article <[email protected]>,
Sivaram Neelakantan <[email protected]> wrote:
...

what then, would be a better way to use the shell for line by line
processing? The stackexchange answer clearly says, people are mimicking
C lang style and other issues, which I agree with. What should a novice
do then? Pretty sure, they wouldn't know about paste/join/cut/comm etc
which sort of makes them do all this.

I usually use MAPFILE (in bash) for this. MAPFILE reads an entire file or process into an array. Then you can iterate the array. So, you end up
with:

mapfile -t < file
for i in "${MAPFILE[@]}"
do
...
done

Nice bash feature, didn't knew it.

In other shell (like the ksh I use) I'd have to do something like

IFS=$'\n' MAPFILE=( $(< mapfile-data) )

to populate an array.

Janis

Or, to do it with a process (the more common case):

mapfile -t < <(process)
for i in "${MAPFILE[@]}"
do
...
done

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From William Ahern@21:1/5 to Ed Morton on Tue Mar 8 18:46:18 2022

Ed Morton <[email protected]> wrote:

On 3/4/2022 4:44 PM, Bit Twister wrote:

while read -r line ; do problem

$ bash --version
GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

I have a bash script which reads a script file and updates variables.
contents of some lines are modified without my script intervention.

Code snippet
1 while read -r line; do
2 _t=$line
3 set -- $(IFS='=' ; echo $_t)
4 _wd=$1
5 case "$_wd" in
6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;; >> 7 <big if/case snip none of which modify _medicare line>
8 echo $line >> $_tmp_fn
9 done < $_taxes_paid_fn

<snip>

3) Don't use a shell loop just to manipulate text as you seem to be
doing, see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

IMO, this is not great advice.

1) If raw throughput matters, you shouldn't be using shell text processing
in the first place; use sed or awk, at the very least, instead.

2) If the input is coming from a pipe, what the read loops buys you is concurrency and parallelism. If the process generating the input has high latency, the concurrency can help tremendously. If either side uses alot of CPU, the parallelism might help performance, overcoming the byte-by-byte
issue.

Case example: last year I downloaded a company engineer-managed script that updated routing tables, created as a workaround for a poorly managed IPSec
VPN configuration deployed on company laptos. When I ran the script it
seemed to hang, so I'd kill it and run it again. After a few minutes I
decided to dive into the script to figure out what was happening. The fundamental problem was that they had one routine generating a list of addresses and another routine consuming the list in a loop. Crucially, the latter, second routine was using the Bash'ism to slurp the input into an
array for processing using a for-loop rather than a while-read-loop. The address-generating routine was doing network I/O to download and preprocess
the lists, which was taking considerable time. Meanwhile, the second loop
was completely idle waiting for the first to finish. The second loop also incurred some surprisingly high latency per address (IIRC, might have been every invocation of route(1) doing reverse DNS or some such). Long story
short, the entire script took much longer to complete than if they had used
a simple while-read-loop, permitting both loops to run concurrently. Plus,
the script would have provided immediate feedback that things were actually progressing.

You see something similar with the widespread adoption of map-filter-reduce functional patterns in languages like JavaScript. The current popularity
seems to have been kicked off by admiration for Haskell-style algorithms,
which once upon a time were popular blog fodder. But Haskell uses lazy list evaluation, unlike languages like JavaScript. The result is that the new preferred pattern results in a tremendous amount of memory usage and churn,
as every transformation step requires constructing and populating a whole
new array. It makes for some horribly inefficient programs; inefficient in a quite opaque way, whereas with traditional patterns the unnecessary array duplications would be immediately obvious, particularly if reading the code with an eye toward improving performance. (Also aren't creating a bunch of closures, which can create barriers to JIT optimization.)

Some of the old patterns--e.g. shell pipes--are far more sophisticated than people give them credit for today. See, e.g., this 2014 paper by Doug
McIlroy, inventor of the Unix pipe, describing the equivalency between coroutines, pipes, and lazy lists:

https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sivaram Neelakantan@21:1/5 to Kenny McCormack on Wed Mar 9 08:16:51 2022

On Tue, Mar 08 2022,Kenny McCormack wrote:

[snipped 9 lines]

I usually use MAPFILE (in bash) for this. MAPFILE reads an entire file or process into an array. Then you can iterate the array. So, you end up
with:

mapfile -t < file
for i in "${MAPFILE[@]}"
do
...
done

Or, to do it with a process (the more common case):

mapfile -t < <(process)
for i in "${MAPFILE[@]}"
do
...
done

Thanks for this, this is news to me.

sivaram
--

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to William Ahern on Wed Mar 9 06:39:02 2022

On 3/8/2022 8:46 PM, William Ahern wrote:

Ed Morton <[email protected]> wrote:

On 3/4/2022 4:44 PM, Bit Twister wrote:

while read -r line ; do problem

$ bash --version
GNU bash, version 5.1.4(1)-release (x86_64-mageia-linux-gnu)

I have a bash script which reads a script file and updates variables.
contents of some lines are modified without my script intervention.

Code snippet
1 while read -r line; do
2 _t=$line
3 set -- $(IFS='=' ; echo $_t)
4 _wd=$1
5 case "$_wd" in
6 _ira_worth) line=" _ira_worth=$_ira_worth # from $_cons_fn" ;; >>> 7 <big if/case snip none of which modify _medicare line>
8 echo $line >> $_tmp_fn
9 done < $_taxes_paid_fn

<snip>

3) Don't use a shell loop just to manipulate text as you seem to be
doing, see
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

IMO, this is not great advice.

1) If raw throughput matters, you shouldn't be using shell text processing
in the first place; use sed or awk, at the very least, instead.

That's the same advice.

I'm not sure what you're saying below. It sounds like you're discussing
some bad software you came across that you improved by replacing a
couple of for loops with a while loop but obviously that doesn't mean it couldn't have been further improved by using, say, awk instead. Can you
provide a concise sample shell script that clearly and simply just
demonstrates what you're describing below and some way of generating
sample input to help us understand what you're describing and so we can
test it?

Ed.

2) If the input is coming from a pipe, what the read loops buys you is concurrency and parallelism. If the process generating the input has high latency, the concurrency can help tremendously. If either side uses alot of CPU, the parallelism might help performance, overcoming the byte-by-byte issue.

Case example: last year I downloaded a company engineer-managed script that updated routing tables, created as a workaround for a poorly managed IPSec VPN configuration deployed on company laptos. When I ran the script it
seemed to hang, so I'd kill it and run it again. After a few minutes I decided to dive into the script to figure out what was happening. The fundamental problem was that they had one routine generating a list of addresses and another routine consuming the list in a loop. Crucially, the latter, second routine was using the Bash'ism to slurp the input into an array for processing using a for-loop rather than a while-read-loop. The address-generating routine was doing network I/O to download and preprocess the lists, which was taking considerable time. Meanwhile, the second loop
was completely idle waiting for the first to finish. The second loop also incurred some surprisingly high latency per address (IIRC, might have been every invocation of route(1) doing reverse DNS or some such). Long story short, the entire script took much longer to complete than if they had used
a simple while-read-loop, permitting both loops to run concurrently. Plus, the script would have provided immediate feedback that things were actually progressing.

You see something similar with the widespread adoption of map-filter-reduce functional patterns in languages like JavaScript. The current popularity seems to have been kicked off by admiration for Haskell-style algorithms, which once upon a time were popular blog fodder. But Haskell uses lazy list evaluation, unlike languages like JavaScript. The result is that the new preferred pattern results in a tremendous amount of memory usage and churn, as every transformation step requires constructing and populating a whole
new array. It makes for some horribly inefficient programs; inefficient in a quite opaque way, whereas with traditional patterns the unnecessary array duplications would be immediately obvious, particularly if reading the code with an eye toward improving performance. (Also aren't creating a bunch of closures, which can create barriers to JIT optimization.)

Some of the old patterns--e.g. shell pipes--are far more sophisticated than people give them credit for today. See, e.g., this 2014 paper by Doug McIlroy, inventor of the Unix pipe, describing the equivalency between coroutines, pipes, and lazy lists:

https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ed Morton on Wed Mar 9 22:38:43 2022

On 09.03.2022 13:39, Ed Morton wrote:

On 3/8/2022 8:46 PM, William Ahern wrote:

I'm not sure what you're saying below. It sounds like you're discussing
some bad software you came across that you improved by replacing a
couple of for loops with a while loop but obviously that doesn't mean it couldn't have been further improved by using, say, awk instead. Can you provide a concise sample shell script that clearly and simply just demonstrates what you're describing below and some way of generating
sample input to help us understand what you're describing and so we can
test it?

What I had associated with the described text was...

1) for loop - that reads completely constructed data - gets replaced
by sequential and parallelisable processing pipe, e.g.

for f in `ls` # often seen [bad] pattern
for f in * # implicitly also sorting

(note: the "lazy evaluation" concept that the poster mentioned, if
implemented in shell, could probably handle both cases as well)

vs.

ls | while read

That example is certainly not accurate describing the intention since
'ls' is implicitly also sorting and requires to store all elements.
So replace 'ls' by 'some_arbitrary_non_buffering_data_generator'.

2)
Then the hypothesis that the slow (character-wise read) 'while read'
could be negligible [in certain cases] if the left-hand-side process
requires more execution time than the character-wise read.
So replace 'ls' by 'some_arbitrary_non_buffering_slow_data_generator'.

3)
Many processes can be parallelised on modern systems (scheduling on
multi-core or multi-CPU systems), so

x | y | z # may run in parallel

(note: the individual processes may also slow down the pipe when
storing huge amounts of data; e.g. in cases like using 'sort')

vs.

xyz # monolithic tool (hypothesis: non-parallelisable)

(note: the latter presumes that the processing steps have to be
or are implemented in a linear, sequential way in 'xyz')

4)
Finally co-processes are mentioned (not directly related) but directly supported by the modern powerful shells (or use external Unix tools).

This is my interpretation of the text. (The author will correct any misunderstandings, I hope.) Note: I am just interpreting, not valuing
what has been said (or what I think had been said).

Janis

Ed.

2) If the input is coming from a pipe, what the read loops buys you is
concurrency and parallelism. If the process generating the input has high
latency, the concurrency can help tremendously. If either side uses
alot of
CPU, the parallelism might help performance, overcoming the byte-by-byte
issue.

Case example: last year I downloaded a company engineer-managed script
that
updated routing tables, created as a workaround for a poorly managed
IPSec
VPN configuration deployed on company laptos. When I ran the script it
seemed to hang, so I'd kill it and run it again. After a few minutes I
decided to dive into the script to figure out what was happening. The
fundamental problem was that they had one routine generating a list of
addresses and another routine consuming the list in a loop. Crucially,
the
latter, second routine was using the Bash'ism to slurp the input into an
array for processing using a for-loop rather than a while-read-loop. The
address-generating routine was doing network I/O to download and
preprocess
the lists, which was taking considerable time. Meanwhile, the second loop
was completely idle waiting for the first to finish. The second loop also
incurred some surprisingly high latency per address (IIRC, might have
been
every invocation of route(1) doing reverse DNS or some such). Long story
short, the entire script took much longer to complete than if they had
used
a simple while-read-loop, permitting both loops to run concurrently.
Plus,
the script would have provided immediate feedback that things were
actually
progressing.

You see something similar with the widespread adoption of
map-filter-reduce
functional patterns in languages like JavaScript. The current popularity
seems to have been kicked off by admiration for Haskell-style algorithms,
which once upon a time were popular blog fodder. But Haskell uses lazy
list
evaluation, unlike languages like JavaScript. The result is that the new
preferred pattern results in a tremendous amount of memory usage and
churn,
as every transformation step requires constructing and populating a whole
new array. It makes for some horribly inefficient programs;
inefficient in a
quite opaque way, whereas with traditional patterns the unnecessary array
duplications would be immediately obvious, particularly if reading the
code
with an eye toward improving performance. (Also aren't creating a
bunch of
closures, which can create barriers to JIT optimization.)

Some of the old patterns--e.g. shell pipes--are far more sophisticated
than
people give them credit for today. See, e.g., this 2014 paper by Doug
McIlroy, inventor of the Unix pipe, describing the equivalency between
coroutines, pipes, and lazy lists:

https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	60:57:34
Calls:	12,446
Calls today:	1
Files:	15,192
Messages:	6,537,458

while read -r line ; do problem

Who's Online

Recent Visitors

System Info