First, I cannot really decipher what you actually want to do and
where your problems are. The usual procedure is to post sample data:
input data and the corresponding output data at least (not shell
code that creates the input data). Anyway you find below some hints
and suggestions...
On 12.03.2023 01:06, Bryan wrote:
I'm using gawk 5.1.0, bash 5.1.16, Ubuntu 22.04.2. I will write and
provide a lot of material in case it is useful or there is conflict
in the script, but I am trying not to ramble.
I prepared a test script below - which should be easy to copy/paste
into a shell, e.g. bash. I am focused on the gsub regexps, which are obviously contrived to replace all these different strings which - as
they vary from output from another program - take the general form (attempting a "plain English" version):
[open apostrophe][the word "path"][maybe an underscore][various
digits][end apostrophe]
I want to take all of that ^^^ and delete it - or equivalently
replace it with nothing (ideally), to prepare input to gnuplot as
"x,y" or "x y" data - two columns.
I tried using this type of command :
gsub("^[a-z]{4}$","TEST") ;
This is fine to substitutes lines containing _only_ a sequence of
four lower case letters to "TEST". gsub() _without_ the ^ and $
anchors will substitute any occurrence of that pattern on a line.
You can provide a third argument to gsub() to operate on variables
or specific fields; in that case the anchors ^ and $ will define
the beginning and end of that variable or field respectively.
It is also advantageous to use /.../ syntax for constant patterns
instead of the string form "...".
... and more, e.g. trying sub and gensub - but did not get far - I am
aware of a curly brace escape that is important or not depending on
the awk version, so I also tried with \{ and \}.
There's no need to escape these braces.
I put "TEST" in the present case for testing a few different cases. I
wrote this script based on extensive reading of a certain popular
online resource and the The Awk Programming Language (1988 - maybe
time for a newer edition?). This is a useful script because as I find
new types of output from the upstream program (a whole other story),
I might add new gsub commands to take care of it.
copy/paste example script:
echo "\
{\"path_1234567\"\
:[`seq -s',' -f '%f' 1 20 `],\
\"path_123456\"\
:[`seq -s',' -f '%f' 1 20 `],\
\"path_1234\"\
:[`seq -s',' -f '%f' 1 20 `],\
\"path1234\"\
:[`seq -s',' -f '%f' 1 20 `]}" | \
gawk -F, '
{
gsub("\{","") ;
gsub("\}","") ;
gsub("\]","") ;
gsub("^[a-z]{4}$","TEST") ;
gsub("\"[a-z][a-z][a-z][a-z]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9]\":\\\[","TESTSEVEN") ;
gsub("\"[a-z][a-z][a-z][a-z][0-9][0-9][0-9][0-9][0-9][0-9]\":\\\[","TESTSIX") ;
gsub("\"[a-z][a-z][a-z][a-z][0-9][0-9][0-9][0-9]\":\\\[","TESTFOURB") ;
gsub("\"[a-z][a-z][a-z][a-z]_[0-9][0-9][0-9][0-9]\":\\\[","TESTFOURA") ;
for (i=1;i<=NF;i++)
{
printf("%s%s",$i,i%2?",":"\n")
}
}'
Instead of echo arguments with quotes and newline-escapes I suggest,
in shell, to use here-documents with this syntax:
awk '
# ... your awk program ...
...
' <<EOT
your data line 1
your data line 2
...
EOT
and with the more contemporary $(...) a line might be
{"path_1234567":[$(seq -s',' -f '%f' 1 20)], ...
but I wouldn't call seq many times but only once and assign it to a
variable and use that repeatedly
s=$(seq -s',' -f '%f' 1 20)
awk '
...
' <<EOT
{"path_1234567":[${s}], ...
...
EOT
If you pipe in or redirect other input just omit the code from <<EOT
onward.
data_from_some_process | awk '...'
awk '...' < data_from_some_file
(But for testing the here-documents have advantages.)
... the last printf thing is perhaps for another post, but (IIUC)
matches every 2nd comma and replaces it with a newline.
printf doesn't replace anything. It prints every other time a newline
instead of a comma.
So that's the
"x,y" data idea. I hope that is clear - I imagine the regexps in the [a-z][0-9] parts ought to be able to go all into one gsub if I knew
the syntax or what to read about.
To match more than one regexp for the _same_ replacement you can
combine them with the | (or) operator. For an example from your
code above use, e.g., gsub(/{|}|]/, "") to remove those three
braces/brackets in one expression.
But with your samples above you can also use other regexp syntaxes,
like ? (for optional parts) and use grouping with parenthesis (...)
for longer subexpressions, e.g.
[a-z][4}_?[0-9]{4}([0-9]{2})?
for an optional underscore and two optional digits.
Janis
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)