I want the files in a tar archive in sorted form. (Using GNU tar.)
Either by date of the file or by its name (containing a number).
For example I want these three files in sorted order like depicted: rfc748.txt
rfc7168.txt
rfc8774.txt
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
I want the files in a tar archive in sorted form. (Using GNU tar.)
Either by date of the file or by its name (containing a number).
For example I want these three files in sorted order like depicted: rfc748.txt
rfc7168.txt
rfc8774.txt
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
On a quick search and man page inspection I couldn't see anything.
I want the files in a tar archive in sorted form. (Using GNU tar.)
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
On 06/05/2022 at 18.41, Janis Papanagnou wrote:
I want the files in a tar archive in sorted form. (Using GNU tar.)
Either by date of the file or by its name (containing a number).
For example I want these three files in sorted order like depicted:
rfc748.txt
rfc7168.txt
rfc8774.txt
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
On a quick search and man page inspection I couldn't see anything.
Here a quick man page inspection reveals:
“--sort=ORDER
When creating an archive, sort directory entries according
to ORDER, which is one of none, name, or inode.”
and with the other options arbitrary w.r.t. the stated requirement.rfc7168.txt
rfc748.txt
rfc8774.txt
Janis Papanagnou <[email protected]> writes:
I want the files in a tar archive in sorted form. (Using GNU tar.)
Either by date of the file or by its name (containing a number).
For example I want these three files in sorted order like depicted:
rfc748.txt
rfc7168.txt
rfc8774.txt
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
I assume that something along
ls -tr dir-with-files/ | xargs tar cf sorted.tgz
is too brittle for you?
Best regards
Axel
On 2022-05-06, Janis Papanagnou <[email protected]> wrote:
I want the files in a tar archive in sorted form. (Using GNU tar.)
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
Various tar(1) implementations can read a list of files to archive.
$ ls | sort >list
$ tar -c -I list -f sorted.tar
GNU tar also supports this.
$ gtar -c -T list -f sorted.tar
On 06.05.2022 20:10, marrgol wrote:
On 06/05/2022 at 18.41, Janis Papanagnou wrote:
I want the files in a tar archive in sorted form. (Using GNU tar.)
Either by date of the file or by its name (containing a number).
For example I want these three files in sorted order like depicted:
rfc748.txt
rfc7168.txt
rfc8774.txt
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
On a quick search and man page inspection I couldn't see anything.
Here a quick man page inspection reveals:
“--sort=ORDER
When creating an archive, sort directory entries according
to ORDER, which is one of none, name, or inode.”
On 2022-05-06, Janis Papanagnou <[email protected]> wrote:
I want the files in a tar archive in sorted form. (Using GNU tar.)
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
Various tar(1) implementations can read a list of files to archive.
$ ls | sort >list
$ tar -c -I list -f sorted.tar
GNU tar also supports this.
$ gtar -c -T list -f sorted.tar
You can also use "-T -" to read the list of files from stdin.
find dir-with-files | sort --version-sort \
| tar -czvf sorted.tgz --sort=none --no-recursion -T -
"--sort-none" tells tar not to do its own sorting.
"--no-recursion" tells tar not to do it's own directory diving
Christian Weisgerber wrote:
On 2022-05-06, Janis Papanagnou <[email protected]> wrote:
I want the files in a tar archive in sorted form. (Using GNU tar.)
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
Various tar(1) implementations can read a list of files to archive.
$ ls | sort >list
$ tar -c -I list -f sorted.tar
GNU tar also supports this.
$ gtar -c -T list -f sorted.tar
You can also use "-T -" to read the list of files from stdin. So:
find dir-with-files | sort --version-sort \
| tar -czvf sorted.tgz --sort=none --no-recursion -T -
I'm abusing sort's "--version-sort" option to get the order that Janis
wants (beware that this will sort decimals incorrectly--i couldn't get "--numeric-sort" to do the desired thing, for some unknown reason).
"--sort-none" tells tar not to do its own sorting. "--no-recursion"
tells tar not to do it's own directory diving--which would also muck
things up.
find (GNU findutils) 4.7.0-git
sort (GNU coreutils) 8.28
tar (GNU tar) 1.29
I want the files in a tar archive in sorted form. (Using GNU tar.)
Either by date of the file or by its name (containing a number).
For example I want these three files in sorted order like depicted: >rfc748.txt
rfc7168.txt
rfc8774.txt
I can add the files incrementally one by one to an empty archive,
but I wanted to know whether there's a trick that I missed to fill
the archive in one go, like tar cf sorted.tgz dir-with-files/
On a quick search and man page inspection I couldn't see anything.
If you want the file names to be sorted by their file contents
modification time, you could do
TZ=UTC0 find dir-with-files/ \
-printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0' \
sort --zero-terminated -t ' ' -k 1,1n -k 2,2 -k 3 |
sed --zero-terminated -E -e '^/([[:graph:]]+ ){2}/s///' |
tar cf sorted.tgz --no-recursion --null --files-from=-
If you want the file names to be sorted by their file contents
modification time, you could do
TZ=UTC0 find dir-with-files/ \
-printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0' \
sort --zero-terminated -t ' ' -k 1,1n -k 2,2 -k 3 |
sed --zero-terminated -E -e '^/([[:graph:]]+ ){2}/s///' |
tar cf sorted.tgz --no-recursion --null --files-from=-
Let GNU find list the filenames, each of them prepended by its data modification time, then let GNU sort sort the list of the filenames
by the prepended time stamps, then let GNU sed remove the prepended timestamps from the filenames and finally feed them to GNU tar.
Thanks to the "-printf" GNU find predicate, the "--zero-terminated"
GNU sort and GNU sed options, and the "--null" GNU tar options that
will work with any filename.
On Sat, 07 May 2022 18:05:21 +0200
Helmut Waitzmann <[email protected]> wrote:
If you want the file names to be sorted by their file contents
modification time, you could do
TZ=UTC0 find dir-with-files/ \
-printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0'
For sorting using modification time it's simpler to do
find dir-with-files/ -printf '%T@ %p\0' |
Spiros Bousbouras <[email protected]>:
On Sat, 07 May 2022 18:05:21 +0200
Helmut Waitzmann <[email protected]> wrote:
If you want the file names to be sorted by their file contents
modification time, you could do
TZ=UTC0 find dir-with-files/ \
-printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0'
For sorting using modification time it's simpler to do
find dir-with-files/ -printf '%T@ %p\0' |
I considered using "%T@" but refrained from using it because of the
paragraph in the GNU find info manual: "Below are the formats for
the directives '%A', '%C', and '%T', which print the file's
timestamps. Some of these formats might not be available on all
systems, due to differences in the C 'strftime' function between
systems."
The POSIX definition of the "strftime" function (<https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#top>)
knows the conversion specifiers "Y", "m", "d", "H", "M", and "S",
but not "@".
I want the files in a tar archive in sorted form.
Janis Papanagnou <[email protected]> writes:
I want the files in a tar archive in sorted form.
After the IMHO fruitful discussion I would like to ask why you want to
have them in sorted order in your tar file. I could not come up with a motivation for this myself. Could you please explain?
Best regards
Axel
So the @ directive is on top of what strftime() offers.Note that >https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-strftime >does not mention @ either. It does mention
%s
The number of seconds since the epoch, i.e., since
1970-01-01 00:00:00 UTC. Leap seconds are not counted
unless leap second support is available.
This format is a GNU extension.
but https://www.gnu.org/software/findutils/manual/html_mono/find.html does >not say you can use %s for seconds since the epoch.
sorted item lists let you find specific items or detect
inconsistencies easier
It may be just me (or few people) who prefer data sorted
On Sat, 07 May 2022 23:23:54 +0200
Helmut Waitzmann <[email protected]> wrote:
Spiros Bousbouras <[email protected]>:
On Sat, 07 May 2022 18:05:21 +0200
Helmut Waitzmann <[email protected]> wrote:
If you want the file names to be sorted by their file contents
modification time, you could do
TZ=UTC0 find dir-with-files/ \
-printf '%TY %Tm-%TdT%TH:%TM:%TS %p\0'
For sorting using modification time it's simpler to do
find dir-with-files/ -printf '%T@ %p\0' |
I considered using "%T@" but refrained from using it because of
the paragraph in the GNU find info manual: "Below are the formats
for the directives '%A', '%C', and '%T', which print the file's
timestamps. Some of these formats might not be available on all
systems, due to differences in the C 'strftime' function between
systems."
The man page says
%Ak File's last access time in the format specified by k, which is
either `@' or a directive for the C `strftime' function. The pos-
sible values for k are listed below; some of them might not be
available on all systems, due to differences in `strftime' between
systems.
@ seconds since Jan. 1, 1970, 00:00 GMT, with fractional
part.
So the @ directive is on top of what strftime() offers.
Note that https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-strftime does not mention @ either. It does mention
%s
The number of seconds since the epoch, i.e., since
1970-01-01 00:00:00 UTC. Leap seconds are not counted
unless leap second support is available.
This format is a GNU extension.
but
https://www.gnu.org/software/findutils/manual/html_mono/find.html
does not say you can use %s for seconds since the epoch.
The POSIX definition of the "strftime" function
(<https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#top>)
knows the conversion specifiers "Y", "m", "d", "H", "M", and "S",
but not "@".
But POSIX does not mention --zero-terminated for sed or sort
either.
Brian Patrie <[email protected]> writes:
You can also use "-T -" to read the list of files from stdin.
Ah, this avoids my xargs, great!
find dir-with-files | sort --version-sort \
| tar -czvf sorted.tgz --sort=none --no-recursion -T -
[...]
"--sort-none" tells tar not to do its own sorting.
Would this be done otherwise, even though the files are given directly
on the command line as arguments (respectively read from STDIN) and not created by globbing?
"--no-recursion" tells tar not to do it's own directory diving
Is my understanding correct that this happens only if "find" returns directories? So depending on the contents of Janis's "dir-with-files", a simple
find dir-with-files -name "rfc*.txt"
might do, even without "-type f".
Janis Papanagnou wrote:
To put the pieces together; for now I think something along
ls rfcs/* | sort -t/ -k2.4,2.7n | tar czf rfcs.tgz -T -
(untested!) would serve me best.
Just beware that subdirectories under rfcs/ may bugger things up.
Also,
too many files might run you into the argv length limit (though that's
mighty huge, these days).
To put the pieces together; for now I think something along
ls rfcs/* | sort -t/ -k2.4,2.7n | tar czf rfcs.tgz -T -
(untested!) would serve me best.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 143:44:24 |
| Calls: | 12,089 |
| Calls today: | 2 |
| Files: | 15,000 |
| Messages: | 6,517,474 |