• tar files sort order by date or numeric name

    From Janis Papanagnou@21:1/5 to All on Fri May 6 18:41:31 2022
    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted:
    rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    On a quick search and man page inspection I couldn't see anything.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Janis Papanagnou on Fri May 6 19:38:43 2022
    Janis Papanagnou <[email protected]> writes:

    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted: rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    I assume that something along

    ls -tr dir-with-files/ | xargs tar cf sorted.tgz

    is too brittle for you?

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From marrgol@21:1/5 to Janis Papanagnou on Fri May 6 20:10:21 2022
    On 06/05/2022 at 18.41, Janis Papanagnou wrote:
    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted: rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    On a quick search and man page inspection I couldn't see anything.

    Here a quick man page inspection reveals:

    “--sort=ORDER
    When creating an archive, sort directory entries according
    to ORDER, which is one of none, name, or inode.”


    --
    mrg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Weisgerber@21:1/5 to Janis Papanagnou on Fri May 6 17:57:09 2022
    On 2022-05-06, Janis Papanagnou <[email protected]> wrote:

    I want the files in a tar archive in sorted form. (Using GNU tar.)

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    Various tar(1) implementations can read a list of files to archive.

    $ ls | sort >list
    $ tar -c -I list -f sorted.tar

    GNU tar also supports this.

    $ gtar -c -T list -f sorted.tar

    --
    Christian "naddy" Weisgerber [email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to marrgol on Sat May 7 03:12:51 2022
    On 06.05.2022 20:10, marrgol wrote:
    On 06/05/2022 at 18.41, Janis Papanagnou wrote:
    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted:
    rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    On a quick search and man page inspection I couldn't see anything.

    Here a quick man page inspection reveals:

    “--sort=ORDER
    When creating an archive, sort directory entries according
    to ORDER, which is one of none, name, or inode.”

    That's what I also had found in the man page, and none of the three
    options will sort by date or by name with a numeric variable-length
    numeric component. With 'name' the order would be
    rfc7168.txt
    rfc748.txt
    rfc8774.txt
    and with the other options arbitrary w.r.t. the stated requirement.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Axel Reichert on Sat May 7 03:08:31 2022
    On 06.05.2022 19:38, Axel Reichert wrote:
    Janis Papanagnou <[email protected]> writes:

    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted:
    rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    I assume that something along

    ls -tr dir-with-files/ | xargs tar cf sorted.tgz

    is too brittle for you?

    Too brittle? - Hmm.. - thinking about what happens if the arguments'
    length will result in more than one call of tar triggered by xargs.
    But I suppose using also the tar option to add to an existing archive
    will solve that issue.

    Thanks.

    Janis

    Best regards

    Axel


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Christian Weisgerber on Sat May 7 03:13:30 2022
    On 06.05.2022 19:57, Christian Weisgerber wrote:
    On 2022-05-06, Janis Papanagnou <[email protected]> wrote:

    I want the files in a tar archive in sorted form. (Using GNU tar.)

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    Various tar(1) implementations can read a list of files to archive.

    $ ls | sort >list
    $ tar -c -I list -f sorted.tar

    GNU tar also supports this.

    $ gtar -c -T list -f sorted.tar


    I missed that. Thanks.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Sat May 7 03:32:40 2022
    On 07.05.2022 03:12, Janis Papanagnou wrote:
    On 06.05.2022 20:10, marrgol wrote:
    On 06/05/2022 at 18.41, Janis Papanagnou wrote:
    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted:
    rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    On a quick search and man page inspection I couldn't see anything.

    Here a quick man page inspection reveals:

    “--sort=ORDER
    When creating an archive, sort directory entries according
    to ORDER, which is one of none, name, or inode.”

    I forgot to mention that this was the place where I'd have expected
    some, say, --sort=mtime option variants. That way the call that I
    currently use to create the tar file - I'm just tar'ing the directory
    that contains the actual files - would stay simple and not require
    xargs (incl. caveats) or separate file lists as suggested elsethread.
    Needless to say, with the suggestions provided, it's just a matter of convenience now, but maybe also a possible --sort extension candidate.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Patrie@21:1/5 to Christian Weisgerber on Sat May 7 02:04:10 2022
    Christian Weisgerber wrote:
    On 2022-05-06, Janis Papanagnou <[email protected]> wrote:

    I want the files in a tar archive in sorted form. (Using GNU tar.)

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    Various tar(1) implementations can read a list of files to archive.

    $ ls | sort >list
    $ tar -c -I list -f sorted.tar

    GNU tar also supports this.

    $ gtar -c -T list -f sorted.tar


    You can also use "-T -" to read the list of files from stdin. So:

    find dir-with-files | sort --version-sort \
    | tar -czvf sorted.tgz --sort=none --no-recursion -T -

    I'm abusing sort's "--version-sort" option to get the order that Janis
    wants (beware that this will sort decimals incorrectly--i couldn't get "--numeric-sort" to do the desired thing, for some unknown reason). "--sort-none" tells tar not to do its own sorting. "--no-recursion"
    tells tar not to do it's own directory diving--which would also muck
    things up.

    find (GNU findutils) 4.7.0-git
    sort (GNU coreutils) 8.28
    tar (GNU tar) 1.29

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Brian Patrie on Sat May 7 09:28:19 2022
    Brian Patrie <[email protected]> writes:

    You can also use "-T -" to read the list of files from stdin.

    Ah, this avoids my xargs, great!

    find dir-with-files | sort --version-sort \
    | tar -czvf sorted.tgz --sort=none --no-recursion -T -

    [...]

    "--sort-none" tells tar not to do its own sorting.

    Would this be done otherwise, even though the files are given directly
    on the command line as arguments (respectively read from STDIN) and not
    created by globbing?

    "--no-recursion" tells tar not to do it's own directory diving

    Is my understanding correct that this happens only if "find" returns directories? So depending on the contents of Janis's "dir-with-files", a
    simple

    find dir-with-files -name "rfc*.txt"

    might do, even without "-type f".

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Brian Patrie on Sat May 7 15:43:07 2022
    On 07.05.2022 09:04, Brian Patrie wrote:
    Christian Weisgerber wrote:
    On 2022-05-06, Janis Papanagnou <[email protected]> wrote:

    I want the files in a tar archive in sorted form. (Using GNU tar.)

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    Various tar(1) implementations can read a list of files to archive.

    $ ls | sort >list
    $ tar -c -I list -f sorted.tar

    GNU tar also supports this.

    $ gtar -c -T list -f sorted.tar


    You can also use "-T -" to read the list of files from stdin. So:

    find dir-with-files | sort --version-sort \
    | tar -czvf sorted.tgz --sort=none --no-recursion -T -

    I'm abusing sort's "--version-sort" option to get the order that Janis
    wants (beware that this will sort decimals incorrectly--i couldn't get "--numeric-sort" to do the desired thing, for some unknown reason).

    With the 'sort' step I can use sort's -kn feature because of the
    regularity of the file names in this case. Thanks.

    Janis

    "--sort-none" tells tar not to do its own sorting. "--no-recursion"
    tells tar not to do it's own directory diving--which would also muck
    things up.

    find (GNU findutils) 4.7.0-git
    sort (GNU coreutils) 8.28
    tar (GNU tar) 1.29

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sat May 7 18:05:21 2022
    Janis Papanagnou <[email protected]>:
    I want the files in a tar archive in sorted form. (Using GNU tar.)
    Either by date of the file or by its name (containing a number).
    For example I want these three files in sorted order like depicted: >rfc748.txt
    rfc7168.txt
    rfc8774.txt

    I can add the files incrementally one by one to an empty archive,
    but I wanted to know whether there's a trick that I missed to fill
    the archive in one go, like tar cf sorted.tgz dir-with-files/

    On a quick search and man page inspection I couldn't see anything.


    The trick of a thorough inspection of the GNU tar info manual (not
    just the manual page, but see the SEE ALSO section of the manual
    page for how to get it), will reveal the options "--no-recursion",
    "--null", and "--files-from", which you could use like in this
    example to have the file names sorted by version number:

    find dir-with-files/ -print0 |
    sort --zero-terminated --version-sort |
    tar cf sorted.tgz --no-recursion --null --files-from=-

    A rule of thumb:  Whenever you want to gain more control of what
    files in what order to be processed by GNU tar, use the
    "--no-recursion", "--null", and "--files-from" options.  Let GNU
    "find … -print0" collect the file names according to the given
    criteria, then sort them using "sort --zero-terminated …" and
    finally feed them to GNU tar.

    If you want the file names to be sorted by their file contents
    modification time, you could do

    TZ=UTC0 find dir-with-files/ \
    -printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0' \
    sort --zero-terminated -t ' ' -k 1,1n -k 2,2 -k 3 |
    sed --zero-terminated -E -e '^/([[:graph:]]+ ){2}/s///' |
    tar cf sorted.tgz --no-recursion --null --files-from=-

    Let GNU find list the filenames, each of them prepended by its data modification time, then let GNU sort sort the list of the filenames
    by the prepended time stamps, then let GNU sed remove the prepended
    timestamps from the filenames and finally feed them to GNU tar.

    Thanks to the "-printf" GNU find predicate, the "--zero-terminated"
    GNU sort and GNU sed options, and the "--null" GNU tar options that
    will work with any filename.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Helmut Waitzmann on Sat May 7 19:23:02 2022
    On 07.05.2022 18:05, Helmut Waitzmann wrote:

    If you want the file names to be sorted by their file contents
    modification time, you could do

    TZ=UTC0 find dir-with-files/ \
    -printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0' \
    sort --zero-terminated -t ' ' -k 1,1n -k 2,2 -k 3 |
    sed --zero-terminated -E -e '^/([[:graph:]]+ ){2}/s///' |
    tar cf sorted.tgz --no-recursion --null --files-from=-

    Thanks for you reply. The nice thing about Unix is that we can
    construct solutions of arbitrary complexity solving (almost)
    every imaginable task.

    The task I have presented is quite primitive. With the previous
    posts I think using something like sort -k1.4,1.7n as part of
    a pipe serves quite well.

    In my opinion, how files are inserted into a tar archive should
    be controlled by tar options. That's why I'd still favor the
    existence of some tar --sort=mdate feature[*] instead of more
    or less complex workarounds. Being able to sort by numeric name
    could also be an option.

    It's certainly arguable whether tar should have an option --sort
    instead of letting an external tool do the sort, but tar option
    --sort is already there in GNU tar, so it would seem obvious to
    complete the set of option arguments.


    To put the pieces together; for now I think something along

    ls rfcs/* | sort -t/ -k2.4,2.7n | tar czf rfcs.tgz -T -

    (untested!) would serve me best.

    Janis

    [*] BTW, I noticed just now that the file dates turned out to be
    not significant to indicate generation of the respective files,
    so I will have to rely on the file name numbering in this case.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Helmut Waitzmann on Sat May 7 17:26:01 2022
    On Sat, 07 May 2022 18:05:21 +0200
    Helmut Waitzmann <[email protected]> wrote:
    If you want the file names to be sorted by their file contents
    modification time, you could do

    TZ=UTC0 find dir-with-files/ \
    -printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0' \
    sort --zero-terminated -t ' ' -k 1,1n -k 2,2 -k 3 |
    sed --zero-terminated -E -e '^/([[:graph:]]+ ){2}/s///' |
    tar cf sorted.tgz --no-recursion --null --files-from=-

    Let GNU find list the filenames, each of them prepended by its data modification time, then let GNU sort sort the list of the filenames
    by the prepended time stamps, then let GNU sed remove the prepended timestamps from the filenames and finally feed them to GNU tar.

    For sorting using modification time it's simpler to do

    find dir-with-files/ -printf '%T@ %p\0' |
    sort -z -n -k1 |
    gawk 'BEGIN { RS = "\0" } {print $2}' | etc.

    Thanks to the "-printf" GNU find predicate, the "--zero-terminated"
    GNU sort and GNU sed options, and the "--null" GNU tar options that
    will work with any filename.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sat May 7 23:23:54 2022
    Spiros Bousbouras <[email protected]>:
    On Sat, 07 May 2022 18:05:21 +0200
    Helmut Waitzmann <[email protected]> wrote:
    If you want the file names to be sorted by their file contents
    modification time, you could do

    TZ=UTC0 find dir-with-files/ \
    -printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0'

    For sorting using modification time it's simpler to do

    find dir-with-files/ -printf '%T@ %p\0' |

    I considered using "%T@" but refrained from using it because of the
    paragraph in the GNU find info manual:  "Below are the formats for
    the directives '%A', '%C', and '%T', which print the file's
    timestamps. Some of these formats might not be available on all
    systems, due to differences in the C 'strftime' function between
    systems."

    The POSIX definition of the "strftime" function (<https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#top>)
    knows the conversion specifiers "Y", "m", "d", "H", "M", and "S",
    but not "@".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Helmut Waitzmann on Sun May 8 06:10:09 2022
    On Sat, 07 May 2022 23:23:54 +0200
    Helmut Waitzmann <[email protected]> wrote:
    Spiros Bousbouras <[email protected]>:
    On Sat, 07 May 2022 18:05:21 +0200
    Helmut Waitzmann <[email protected]> wrote:
    If you want the file names to be sorted by their file contents
    modification time, you could do

    TZ=UTC0 find dir-with-files/ \
    -printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0'

    For sorting using modification time it's simpler to do

    find dir-with-files/ -printf '%T@ %p\0' |

    I considered using "%T@" but refrained from using it because of the
    paragraph in the GNU find info manual: "Below are the formats for
    the directives '%A', '%C', and '%T', which print the file's
    timestamps. Some of these formats might not be available on all
    systems, due to differences in the C 'strftime' function between
    systems."

    The man page says

    %Ak File's last access time in the format specified by k, which is
    either `@' or a directive for the C `strftime' function. The pos-
    sible values for k are listed below; some of them might not be
    available on all systems, due to differences in `strftime' between
    systems.

    @ seconds since Jan. 1, 1970, 00:00 GMT, with fractional
    part.

    So the @ directive is on top of what strftime() offers.Note that https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-strftime does not mention @ either. It does mention

    %s

    The number of seconds since the epoch, i.e., since
    1970-01-01 00:00:00 UTC. Leap seconds are not counted
    unless leap second support is available.

    This format is a GNU extension.

    but https://www.gnu.org/software/findutils/manual/html_mono/find.html does not say you can use %s for seconds since the epoch.

    The POSIX definition of the "strftime" function (<https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#top>)
    knows the conversion specifiers "Y", "m", "d", "H", "M", and "S",
    but not "@".

    But POSIX does not mention --zero-terminated for sed or sort either.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Janis Papanagnou on Sun May 8 09:19:43 2022
    Janis Papanagnou <[email protected]> writes:

    I want the files in a tar archive in sorted form.

    After the IMHO fruitful discussion I would like to ask why you want to
    have them in sorted order in your tar file. I could not come up with a motivation for this myself. Could you please explain?

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Axel Reichert on Sun May 8 10:28:51 2022
    On 08.05.2022 09:19, Axel Reichert wrote:
    Janis Papanagnou <[email protected]> writes:

    I want the files in a tar archive in sorted form.

    After the IMHO fruitful discussion I would like to ask why you want to
    have them in sorted order in your tar file. I could not come up with a motivation for this myself. Could you please explain?

    Sure. In short: sorted item lists let you find specific items or detect inconsistencies easier on inspection or on comparison with other data.

    If I inspect a foreign tar file I typically inspect the contents before
    the decision of unpacking them or not. If I obtain a package with sorted numbered items that are a subset of a larger set I can easily identify
    whether a set of entities is in that package or not. It's just the usual
    effect that sorted item lists let you identify specific items easier and faster. The alternative for me with an unsorted archive would be to sort
    the output of 'tar tvf' for that purpose. It's easier, though, to sort
    it once when populating the archive than to require it be sorted by the unpacking users many times. It's similar to, say, 'ls'; I don't want to
    type 'ls | sort -whatever' every time to get an order where I can easily
    spot what I am looking for. It may be just me (or few people) who prefer
    data sorted, but since it doesn't cost me anything to provide it sorted
    I decided to just do it that way.

    BTW, the displayed (and sorted by date) items let me (in the course of
    the discussion posts in this thread) recognize that the file's 'mtime'
    isn't consistent with the file numbers order. So we can consider the
    sorting also as a quality measure of data sets that helps finding bugs
    or data inconsistencies easier.

    And it's not only convenience for humans, also for computers/programs.
    I recall that in the 1990's (when I was closer to programming than I am
    now) we had sorted *.a (or *.so, don't recall) library archives. I don't
    recall the technical details or the exact rationale, but the reason was
    to increase the performance of the build process.

    And, finally, for those who don't see an advantage of sorted data, let
    me also point you to Donald Knuth's decade old book series "The Art of
    Computer Programming" with the third book about "Sorting and Searching".
    In the introduction he points to the "most important applications of
    sorting"; a) Solving the "togetherness" problem, b) Matching items in
    two or more files, and c) Searching for information by key values, that
    closely resemble the reasons I had.

    Janis


    Best regards

    Axel


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to [email protected] on Sun May 8 10:06:14 2022
    In article <[email protected]>,
    Spiros Bousbouras <[email protected]> wrote:
    ...
    So the @ directive is on top of what strftime() offers.Note that >https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-strftime >does not mention @ either. It does mention

    %s

    The number of seconds since the epoch, i.e., since
    1970-01-01 00:00:00 UTC. Leap seconds are not counted
    unless leap second support is available.

    This format is a GNU extension.

    but https://www.gnu.org/software/findutils/manual/html_mono/find.html does >not say you can use %s for seconds since the epoch.

    I think the problem is that %s was already "taken" by "find" to mean
    "size", so they couldn't use %s (from strftime) to mean seconds since the epoch. So, they had to come up with something else (for "find" to use).

    --
    "Everything Roy (aka, AU8YOG) touches turns to crap."
    --citizens of alt.obituaries--

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Janis Papanagnou on Sun May 8 12:37:42 2022
    Janis Papanagnou <[email protected]> writes:

    sorted item lists let you find specific items or detect
    inconsistencies easier

    Thanks. Spotting inconsistencies did not occur to me, although I have
    often used sorting for this.

    It may be just me (or few people) who prefer data sorted

    Me too, a habit passed on by my father. It also helps to find structure
    in the data and thus, contrary to its bean-counting image might spark creativity.

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sun May 8 14:42:13 2022
    Spiros Bousbouras <[email protected]>:
    On Sat, 07 May 2022 23:23:54 +0200
    Helmut Waitzmann <[email protected]> wrote:
    Spiros Bousbouras <[email protected]>:
    On Sat, 07 May 2022 18:05:21 +0200
    Helmut Waitzmann <[email protected]> wrote:

    If you want the file names to be sorted by their file contents
    modification time, you could do

    TZ=UTC0 find dir-with-files/ \
    -printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0'

    For sorting using modification time it's simpler to do

    find dir-with-files/ -printf '%T@ %p\0' |

    I considered using "%T@" but refrained from using it because of
    the paragraph in the GNU find info manual: "Below are the formats
    for the directives '%A', '%C', and '%T', which print the file's
    timestamps. Some of these formats might not be available on all
    systems, due to differences in the C 'strftime' function between
    systems."

    The man page says

    %Ak File's last access time in the format specified by k, which is
    either `@' or a directive for the C `strftime' function. The pos-
    sible values for k are listed below; some of them might not be
    available on all systems, due to differences in `strftime' between
    systems.

    @ seconds since Jan. 1, 1970, 00:00 GMT, with fractional
    part.

    So the @ directive is on top of what strftime() offers.


    Strange.  I checked the manual page at my system (Debian buster)
    and it's indeed the same as yours.  But the GNU find info manual at
    my system says what I cited above.  The online GNU find info manual
    (<https://www.gnu.org/software/findutils/manual/html_node/find_html/Time-Formats.html#Time-Formats>)
    is even more clear:  "Below is an incomplete list of formats for
    the directives ‘%A’, ‘%B’, ‘%C’, and ‘%T’, which print the file’s
    timestamps.  Please refer to the documentation of strftime for the
    full list.  Some of these formats might not be available on all
    systems, due to differences in the implementation of the C strftime
    function."

    That is:  If a conversion specifier is not in the documentation of
    the strftime function ("the full list"), then it will not be
    available with GNU find.

    So the info manuals disagree with the manual page.  Which one of
    them is correct, which one is wrong?

    Also, the manual page says (in the SEE ALSO section):  "The full
    documentation for find is maintained as a Texinfo manual.  If the
    info and find programs are properly installed at your site, the
    command info find should give you access to the complete manual."

    That lets me assume that the info manual is more complete than the
    manual page.

    In the BUGS section, the manual page says:  "The environment
    variable LC_COLLATE has no effect on the -ok action." whereas in
    the EXPRESSION and ENVIRONMENT VARIABLES sections it states that
    the interpretation of the response given will be affected by the
    environment variable LC_COLLATE.

    Apparently the manual page contradicts itself.  That makes me doubt
    of the reliability of it.  Perhaps it's a compilation of different
    sources?

    Note that https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-strftime does not mention @ either. It does mention

    %s

    The number of seconds since the epoch, i.e., since
    1970-01-01 00:00:00 UTC. Leap seconds are not counted
    unless leap second support is available.

    This format is a GNU extension.

    but
    https://www.gnu.org/software/findutils/manual/html_mono/find.html
    does not say you can use %s for seconds since the epoch.

    But since %s is neither part of the POSIX definition of the
    strftime function nor part of the find info manual I refrained from
    using it as well.

    The POSIX definition of the "strftime" function
    (<https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#top>)
    knows the conversion specifiers "Y", "m", "d", "H", "M", and "S",
    but not "@".

    But POSIX does not mention --zero-terminated for sed or sort
    either.

    As in the OP Janis stated that he is using GNU tar, I assumed he
    might use GNU find, GNU sort, and GNU sed as well.  As he made no
    statement about the strftime function at his system, I preferred to
    be better safe than sorry.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Patrie@21:1/5 to Axel Reichert on Mon May 9 16:19:57 2022
    Axel Reichert wrote:
    Brian Patrie <[email protected]> writes:

    You can also use "-T -" to read the list of files from stdin.

    Ah, this avoids my xargs, great!

    find dir-with-files | sort --version-sort \
    | tar -czvf sorted.tgz --sort=none --no-recursion -T -

    [...]

    "--sort-none" tells tar not to do its own sorting.

    Would this be done otherwise, even though the files are given directly
    on the command line as arguments (respectively read from STDIN) and not created by globbing?

    It did for me.

    "--no-recursion" tells tar not to do it's own directory diving

    Is my understanding correct that this happens only if "find" returns directories? So depending on the contents of Janis's "dir-with-files", a simple

    find dir-with-files -name "rfc*.txt"

    might do, even without "-type f".

    Yes, "-type f" would probably solve the same problem, as
    "--no-recursion", as long as no other types need to be caught, and you
    don't need directory metadata in the archive (which may be desirable,
    depending on the use case). It would be needed even without
    subdirectories, as find will normally yield the specified dir in its output.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Brian Patrie on Tue May 10 00:09:11 2022
    On 09.05.2022 23:46, Brian Patrie wrote:
    Janis Papanagnou wrote:
    To put the pieces together; for now I think something along

    ls rfcs/* | sort -t/ -k2.4,2.7n | tar czf rfcs.tgz -T -

    (untested!) would serve me best.

    Just beware that subdirectories under rfcs/ may bugger things up.

    I don't have subdirectories, that's why I said it serves me best.

    Also,
    too many files might run you into the argv length limit (though that's
    mighty huge, these days).

    Ah, right. I might then replace that code by

    printf "%s\n" rfcs/* | sort -t/ -k2.4,2.7n | tar czf rfcs.tgz -T -

    which (as a shell built-in) doesn't have that limit. (Or I can use
    find, as suggested elsethread, though I prefer efficient built-ins.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Patrie@21:1/5 to Janis Papanagnou on Mon May 9 16:46:50 2022
    Janis Papanagnou wrote:
    To put the pieces together; for now I think something along

    ls rfcs/* | sort -t/ -k2.4,2.7n | tar czf rfcs.tgz -T -

    (untested!) would serve me best.

    Just beware that subdirectories under rfcs/ may bugger things up. Also,
    too many files might run you into the argv length limit (though that's
    mighty huge, these days).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)