• Re: printing words without newlines?

    From Bruce Horrocks@21:1/5 to David Chmelik on Sun May 12 09:52:51 2024
    XPost: alt.comp.lang.awk

    On 12/05/2024 05:57, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I approaching it wrong? I recall BASIC prints new lines, but as I learned basic C and some derivatives, I'm used to newlines only being specified... ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    You need to set ORS in the BEGIN { } section (or on the command line).

    See
    <https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html>
    for an example - just replace the "\n\n" in the example with " " to see
    the effect you are looking for.

    --
    Bruce Horrocks
    Surrey, England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Horrocks@21:1/5 to Bruce Horrocks on Sun May 12 09:55:52 2024
    XPost: alt.comp.lang.awk

    On 12/05/2024 09:52, Bruce Horrocks wrote:
    On 12/05/2024 05:57, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print.  I use GNU AWK (gawk) and its sort but printing is harder to get
    working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I
    approaching it wrong?  I recall BASIC prints new lines, but as I learned
    basic C and some derivatives, I'm used to newlines only being
    specified...
    ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
       while(getline<file) arr[$1]=$0
       PROCINFO["sorted_in"]="@ind_num_asc"
       for(i in arr)
       {
         split(arr[i],arr2)
         # output all words or on one line with ORS
         print arr2[2]
         # output all words on one line without needing ORS
         #printf("%s ",arr2[2])
       }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    You need to set ORS in the BEGIN { } section (or on the command line).

    See <https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> for an example - just replace the "\n\n" in the example with " " to see the effect you are looking for.


    Let me re-phrase that: it would be better to set ORS in the BEGIN {}
    section. I'm not sure why yours is not working but with some commented
    out code and some not, your example is unclear.

    If what I have suggested doesn't work for you then please re-post your
    exact code.

    --
    Bruce Horrocks
    Surrey, England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to [email protected] on Sun May 12 12:11:27 2024
    XPost: alt.comp.lang.awk

    In article <[email protected]>,
    Bruce Horrocks <[email protected]> wrote:
    ...
    You need to set ORS in the BEGIN { } section (or on the command line).

    This is demonstrably false. You can set ORS whenever/wherever you want. Whatever value it has when a plain "print" statement is executed, is what
    will be used. You are probably about thinking about the various variables
    that affect input parsing. These variables clearly must be set prior to the reading of the input, which usually means they need to be set in BEGIN (or
    via something like -F or -v on the command line).

    One of my favorite idioms (and one that might actually be useful to OP) is:

    # Print every 3 input lines as a single output line
    # Yes, this single line is the whole program!
    ORS = NR % 3 ? " " : "\n"

    See ><https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> >for an example - just replace the "\n\n" in the example with " " to see
    the effect you are looking for.

    Of course, the whole point of this thread is that none of us has any idea
    what OP is talking about or what his actual problem is. We can only guess...

    --
    "It does a lot of things half well and it's just a garbage heap of ideas that are
    mutually exclusive."

    - Ken Thompson, on C++ -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sun May 12 18:22:05 2024
    XPost: alt.comp.lang.awk

    <snip>
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get >working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I >approaching it wrong? I recall BASIC prints new lines, but as I learned >basic C and some derivatives, I'm used to newlines only being specified... >------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    <snip>

    I think you forgot that arr2 is now an array => you have to iterate over
    it as well. There were also a few other coding errors, ie. not closing
    the data.txt file; not declaring local vars in print_file_words:

    --
    $ cat test.awk
    BEGIN { print_file_words("data.txt") }

    function print_file_words(file, i,j) {
    ORS = " "
    PROCINFO["sorted_in"]="@ind_num_asc"
    while (getline <file >0)
    arr[$1] = $0
    close (file)

    for(i in arr) {
    split(arr[i],arr2)
    for (j in arr2)
    print arr2[j]
    }
    ORS = "\n"
    print ""
    }

    $ gawk -f test.awk
    all are base belong to us your
    --

    Probably this is not the best way of doing things but I think you're
    mainly just experimenting with sorting/printing so..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Mon May 13 04:50:39 2024
    XPost: alt.comp.lang.awk

    <snip>
    My original works after rebooting after discussion in main thread (without 'Re') but thanks for instruction to close file, though I don't know you
    need to pass in i--not used outside. It's odd iterating over arr2 even
    still prints all words (wrong order) because the way I used arr2 it only
    ever had one number and one word--its point was to split out & get word,
    then for the next i, it's split again onto arr2 which is erased/updated.

    You're right that in your particular data case --one word per line--
    arr2 is always of length 1 => you could use arr2[1]. But creating
    the arr2 array via split() isn't even necessary since arr will print
    out in the order specified in PROCINFO["sorted_in"]:
    --
    $ cat test.awk
    BEGIN { print_file_words("data.txt") }

    function print_file_words(file, i) {
    ORS = " "
    PROCINFO["sorted_in"]="@ind_num_asc"
    while (getline <file >0)
    arr[$1] = $0
    close (file)

    for(i in arr)
    print arr[i]
    ORS = "\n"
    print ""
    }

    $ gawk -f test.awk data.txt
    all are base belong to us your
    -

    WRT close() you should do it whenever you're finish reading from a
    file OR command. WRT user-defined functions, variables intended to be
    local to the function should be declared otherwise they become global variables; try removing the "i" from the function print_file_words()
    definition and tacking on the following to your code:

    END { print "i =", i }

    which will print "i = your" as the last line of output.

    Have fun,
    -j

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to [email protected] on Mon May 13 06:56:50 2024
    XPost: alt.comp.lang.awk

    In article <v1pi7c$2b87j$[email protected]>,
    David Chmelik <[email protected]> wrote:
    ...
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    I guess this is what you actually want:

    { A[$1] = $2 }
    END {
    len = length(A)
    for (i=1; i<=len; i++)
    printf("%s%s",A[i],i<len ? " " : "\n")
    }

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Noam

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to David Chmelik on Mon May 13 10:18:40 2024
    XPost: alt.comp.lang.awk

    On 12.05.2024 06:57, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I approaching it wrong? I recall BASIC prints new lines, but as I learned basic C and some derivatives, I'm used to newlines only being specified...

    IIUC you meanwhile have your script running, and probably code similar
    to

    BEGIN { print_file_words("data.txt"); }

    function print_file_words(file) {
    while (getline <file >0)
    arr[$1] = $0
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for (i in arr) {
    split (arr[i], arr2)
    printf "%s ", arr2[2]
    }
    printf "\n"
    }

    I suggest to add the '>0' test to your code, and also print a final
    "\n" so that your command line prompt doesn't overwrite your output.
    Note also that printf (like print) is a command, no function. Adding
    local variable declarations is also sensible to not get problems if
    you operate your code in other source code contexts.

    Janis

    ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to Kenny McCormack on Mon May 13 14:53:38 2024
    XPost: alt.comp.lang.awk

    In article <v1sdji$tofu$[email protected]>,
    Kenny McCormack <[email protected]> wrote:
    ...
    I guess this is what you actually want:

    { A[$1] = $2 }
    END {
    len = length(A)
    for (i=1; i<=len; i++)
    printf("%s%s",A[i],i<len ? " " : "\n")
    }

    Improved version:

    { A[$1] = $2 }
    END {
    for (i=1; i<=NR; i++)
    printf("%s%s",A[i],i<NR ? " " : "\n")
    }

    Note that the value of NR in END is sort of a gray area, but it works as expected in GAWK, which is really all we care about.

    --
    [Donald] Trump didn't have it all handed to him by his parents,
    like Hillary Clinton did.

    - Some dumb cluck in Ohio; featured in Michael Moore's "Trumpland" -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Kenny McCormack on Mon May 13 16:49:59 2024
    XPost: alt.comp.lang.awk

    On 2024-05-12, Kenny McCormack <[email protected]> wrote:
    In article <[email protected]>,
    Bruce Horrocks <[email protected]> wrote:
    ...
    You need to set ORS in the BEGIN { } section (or on the command line).

    This is demonstrably false. You can set ORS whenever/wherever you want. Whatever value it has when a plain "print" statement is executed, is what will be used. You are probably about thinking about the various variables that affect input parsing. These variables clearly must be set prior to the reading of the input, which usually means they need to be set in BEGIN (or via something like -F or -v on the command line).

    One of my favorite idioms (and one that might actually be useful to OP) is:

    # Print every 3 input lines as a single output line
    # Yes, this single line is the whole program!
    ORS = NR % 3 ? " " : "\n"

    See >><https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> >>for an example - just replace the "\n\n" in the example with " " to see
    the effect you are looking for.

    Of course, the whole point of this thread is that none of us has any idea what OP is talking about or what his actual problem is. We can only guess...

    The problem seems to be that there is a file of words preceded by
    unique integer ranks which indicate the order. They are to be reproduced
    in rank order, on one line.

    s is the TXR Lisp interactive listener of TXR 294.
    Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet. Self-assembly keeps TXR costs low; but ask about our installation service!
    (flow "data.txt"
    file-get-lines
    (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
    transpose
    (select (second @1) (first @1))
    (join-with " ")
    put-line)
    all your base are belong to us

    We can insert prints into the pipeline to see the transformations:

    (flow "data.txt"
    prinl
    file-get-lines
    prinl
    (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
    prinl
    transpose
    prinl
    (select (second @1) (first @1))
    prinl
    (join-with " ")
    prinl
    put-line)
    "data.txt"
    ("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
    (#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
    #(5 "to"))
    #(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
    #("all" "your" "base" "are" "belong" "to" "us")
    "all your base are belong to us"
    all your base are belong to us
    t

    That is tedious; say, why not make a macro dflow (debug flow) which inserts those prinl's for us?

    (defmacro dflow (. args)
    ^(flow ,*(interpose 'prinl args)))
    dflow

    Sanity check: is it inserting prinls?

    (macroexpand-1 '(dflow a b c d))
    (flow a prinl
    b prinl c prinl
    d)

    Use dflow:

    (dflow "data.txt"
    file-get-lines
    (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
    transpose
    (select (second @1) (first @1))
    (join-with " ")
    put-line)
    "data.txt"
    ("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
    (#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
    #(5 "to"))
    #(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
    #("all" "your" "base" "are" "belong" "to" "us")
    "all your base are belong to us"
    all your base are belong to us
    t

    After file-get-lines we have a list of strings like "2 your".

    We map those through an anonymous function which matches the
    string pattern `@a @b` to capture the space-separated text pieces.
    A is converted to integer and mapped to its predecessor
    (because we want to use it as an index, and indexing is zero based).
    We map each string to a two element vector consisting of the
    zero-based index as an integer type, and a string, so now we have:

    (#(1 "your") #(0 "all") ...)

    #(a b c) is a vector notation.

    Then we want to transpose rows to columns to get the integer
    column as a vector, and the values as a vector.

    #(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))

    Now we use the built-in function select which selects elements out
    of a sequence, based on indices supplied in another sequence.

    Now we have the vector of words in the right order; we just
    join with a space.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to All on Mon May 13 17:26:56 2024
    XPost: alt.comp.lang.awk

    In article <[email protected]>,
    Kaz Kylheku <[email protected]> wrote:
    ...
    (This version more complicated than it needs to be, but essentially the
    same as what I posted earlier)
    $ awk '{
    if ($1 > max) max = $1;
    rank[$1] = $2
    }

    END {
    for (i = 1; i <= max; i++)
    if (i in rank) {
    printf("%s%s", sep, rank[i]);
    sep = " "
    }
    print ""
    }' data.txt
    all your base are belong to us

    We do not perform any sort, and so we don't require GNU extensions. Sorting is

    But GNU extensions are good - especially since OP specifically mentioned
    using GAWK. And much more on-topic than Lisp (et al).

    Final note: In fact, it has been established (on this newsgroup as well as empirically by me and others) that if the indices are small integers, you
    get sorting for free (in GAWK, which, as noted, is all we care about). So,
    you don't even really need to mess with PROCINFO[]...

    And, one more note about sorting. Some responders on this thread have
    gotten confused about what is to be sorted. They assumed that OP wanted
    the words sorted (alphabetically), when, in fact, he just wants them sorted (numerically) by the position number (the first field in the data line).

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Mandela

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to David Chmelik on Mon May 13 17:17:05 2024
    XPost: alt.comp.lang.awk

    On 2024-05-12, David Chmelik <[email protected]> wrote:
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    $ awk '{
    if ($1 > max) max = $1;
    rank[$1] = $2
    }

    END {
    for (i = 1; i <= max; i++)
    if (i in rank) {
    printf("%s%s", sep, rank[i]);
    sep = " "
    }
    print ""
    }' data.txt
    all your base are belong to us

    We do not perform any sort, and so we don't require GNU extensions. Sorting is silly, because data is already sorted: we are given the positional rank of every word, which is a way of capturing order. All we have to do is visit the words in that order.

    We can do that by iterating an index i from 1 to the highest index
    we have seen. If there is a rank[i] entry, then we print it.
    (We do this "(i in rank)" check in case there are gaps in the rank
    sequence.)

    After we print one word, we start using the " " separator before all
    subsequent words.

    If we must sort, there is the sort utility:

    $ sort -n data.txt | awk '{ printf("%s%s", sep, $2); sep = " " }' && echo
    all your base are belong to us

    Also, if we can suffer a spurious trailing space:

    $ sort -n data.txt | awk '{ print $2 }' | tr '\n' ' ' && echo
    all your base are belong to us

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Kenny McCormack on Mon May 13 23:33:07 2024
    XPost: alt.comp.lang.awk

    On 2024-05-13, Kenny McCormack <[email protected]> wrote:
    In article <[email protected]>,
    Kaz Kylheku <[email protected]> wrote:
    ...
    (This version more complicated than it needs to be, but essentially the
    same as what I posted earlier)
    $ awk '{
    if ($1 > max) max = $1;
    rank[$1] = $2
    }

    END {
    for (i = 1; i <= max; i++)
    if (i in rank) {
    printf("%s%s", sep, rank[i]);
    sep = " "
    }
    print ""
    }' data.txt
    all your base are belong to us

    We do not perform any sort, and so we don't require GNU extensions. Sorting is

    But GNU extensions are good - especially since OP specifically mentioned using GAWK. And much more on-topic than Lisp (et al).

    The above performs O(N) steps, whereas sorting is O(N log N),
    and sometimes worse due to degenerate cases in some algorithms.

    Why use an extension that only makes the program more verbose and brings
    in an unnecessary algorithm.

    Final note: In fact, it has been established (on this newsgroup as well as empirically by me and others) that if the indices are small integers, you
    get sorting for free (in GAWK, which, as noted, is all we care about). So, you don't even really need to mess with PROCINFO[]...

    Are you referring to the idea of just replacing the above for + if
    structure with:

    for (i in rank) {

    }

    and relying on the small integer indices being hashed in order?

    Where is that documented? The manual reiterates that this is not
    specified: "By default, the order in which a ‘for (indx in array)’ loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk."

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Thu May 16 15:55:35 2024
    XPost: alt.comp.lang.awk

    On 16.05.2024 15:11, Ed Morton wrote:
    On 5/11/2024 11:57 PM, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get
    working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)...

    [...]
    ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "

    Move the above to a BEGIN section so it is executed once total instead
    of once per input line.

    A function definition called once from the BEGIN section isn't
    called "once per input line".

    Janis


    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)