• Struggling with exec+find

    From Luc@21:1/5 to All on Sun Apr 30 16:13:01 2023
    ------------------------------------------
    set ::EXCEPTIONS {
    "/tmp/*" "/run/*" "/proc/*" "/sys/*"
    "/home/luc/Mail/*"
    "/var/ram/*" "/var/tmp/*"
    }

    # HANDLE ARGUMENTS - default to "/"
    set ::SCANPATH "/"
    if {$::argc > 0} {
    if {$::argc != 2} {puts "You must provide 2 arguments or nothing."; exit}
    set ::SCANPATH "[lindex $::argv 0]"
    set ::FILELISTFILE "[lindex $::argv 1]"
    }

    set _commandline "find $::SCANPATH -type f,d"
    if {$SCANPATH == "/"} {
    foreach i $EXCEPTIONS {
    set _commandline "$_commandline -path '$i' -prune -o"
    }
    set _commandline "$_commandline -print"
    }

    puts $_commandline
    set ::FILELIST [exec {*}$::_commandline]
    puts $::FILELIST
    ------------------------------------------

    The content of $_commandline should be

    find / -type f,d -path '/tmp/*' -prune -o -path '/run/*' -prune -o -path '/proc/*' -prune -o -path '/sys/*' -prune -o -path '/home/luc/Mail/*' -prune -o -path '/var/ram/*' -prune -o -path '/var/tmp/*' -prune -o -print

    And it seems to be correct by the [puts] statement output.

    However, the exceptions don't work as expected. If I run that exact same command line manually, the result excludes files and directories within
    any of the $::EXCEPTIONS list. In my Tcl script, I can soon spot /proc/*
    in the list of results, which is wrong.

    I suspect my exec line is wrong, but I may never know why.

    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Luc on Sun Apr 30 19:34:16 2023
    Luc <[email protected]d> wrote:
    ------------------------------------------
    set ::EXCEPTIONS {
    "/tmp/*" "/run/*" "/proc/*" "/sys/*"
    "/home/luc/Mail/*"
    "/var/ram/*" "/var/tmp/*"
    }

    # HANDLE ARGUMENTS - default to "/"
    set ::SCANPATH "/"
    if {$::argc > 0} {
    if {$::argc != 2} {puts "You must provide 2 arguments or nothing."; exit}
    set ::SCANPATH "[lindex $::argv 0]"
    set ::FILELISTFILE "[lindex $::argv 1]"
    }

    set _commandline "find $::SCANPATH -type f,d"
    if {$SCANPATH == "/"} {
    foreach i $EXCEPTIONS {
    set _commandline "$_commandline -path '$i' -prune -o"
    }
    set _commandline "$_commandline -print"
    }

    puts $_commandline
    set ::FILELIST [exec {*}$::_commandline]
    puts $::FILELIST
    ------------------------------------------

    The content of $_commandline should be

    find / -type f,d -path '/tmp/*' -prune -o -path '/run/*' -prune -o -path '/proc/*' -prune -o -path '/sys/*' -prune -o -path '/home/luc/Mail/*' -prune -o -path '/var/ram/*' -prune -o -path '/var/tmp/*' -prune -o -print

    And it seems to be correct by the [puts] statement output.

    However, the exceptions don't work as expected. If I run that exact same command line manually, the result excludes files and directories within
    any of the $::EXCEPTIONS list. In my Tcl script, I can soon spot /proc/*
    in the list of results, which is wrong.

    I suspect my exec line is wrong, but I may never know why.

    #1 - Don't mix string and list operations on the same data (unless you
    are *really* sure you know you are doing).

    #2 - Tcl does not 'exec' by using bash, so necessary bash quoting is
    not needed for Tcl 'exec' command lines.

    Changes:

    set _commandline [list find $::SCANPATH -type f,d]
    ...
    foreach i $EXCEPTIONS {
    lappend _commandline -path $i -prune -o
    }
    lappend _commandline -print
    ...
    puts [join $_commandline]

    First change, _commandLine is a proper list from the start, and is
    built up using lappend.

    Second change, no single quotes around the $i expansion. When you ran
    the command line from bash, bash removed the single quotes before it
    launched find. Tcl's exec was passing a literal '/proc/*' (literally
    with the single quotes) to find, which would not match anything.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to Luc on Sun Apr 30 16:29:45 2023
    **************************
    On Sun, 30 Apr 2023 16:13:01 -0300, Luc wrote:

    puts $_commandline
    set ::FILELIST [exec {*}$::_commandline]
    puts $::FILELIST

    Another problem: I remove the puts line and it seems the line
    above still causes the whole file list to be printed to stdout.
    I don't want that. I will want it to be silently eventually.


    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to Rich on Sun Apr 30 17:55:26 2023
    On Sun, 30 Apr 2023 19:34:16 -0000 (UTC), Rich wrote:

    #1 - Don't mix string and list operations on the same data (unless you
    are *really* sure you know you are doing).

    #2 - Tcl does not 'exec' by using bash, so necessary bash quoting is
    not needed for Tcl 'exec' command lines.

    Changes:

    set _commandline [list find $::SCANPATH -type f,d]
    ...
    foreach i $EXCEPTIONS {
    lappend _commandline -path $i -prune -o
    }
    lappend _commandline -print
    ...
    puts [join $_commandline]

    First change, _commandLine is a proper list from the start, and is
    built up using lappend.

    Second change, no single quotes around the $i expansion. When you ran
    the command line from bash, bash removed the single quotes before it
    launched find. Tcl's exec was passing a literal '/proc/*' (literally
    with the single quotes) to find, which would not match anything.


    Thank you again. Your code of course works.

    Then a new problem comes up. Right after that code, this:

    set ::FILELIST [join [lsort -dictionary $::FILELIST] "\n"]

    error:

    list element in quotes followed by "," instead of space
    while executing
    "lsort -dictionary $::FILELIST"
    (procedure "p.5.run" line 15)
    invoked from within
    "p.5.run"
    (file "/home/tcl/bin/dirlist-nix.tcl" line 188)


    The content is obviously the entire list of (almost) all files in my
    hard disk. It is not formatted data or even completely predictable data.
    Am I supposed to "escape" anything to make this work?

    OK. I'm going to try this:

    set ::FILELIST [join [lsort -dictionary [list $::FILELIST]] "\n"]

    It works. OK. But I'm not sure why.

    Why do I have to explicitly tell Tcl that $::FILELIST is a list, or rather,
    why do I have to convert it into a list? I always assumed that Tcl shimmied automatically as needed. Apparently, I've been wrong.


    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Luc on Sun Apr 30 23:30:50 2023
    Luc <[email protected]d> wrote:
    Thank you again. Your code of course works.

    Then a new problem comes up. Right after that code, this:

    set ::FILELIST [join [lsort -dictionary $::FILELIST] "\n"]

    error:

    list element in quotes followed by "," instead of space
    while executing
    "lsort -dictionary $::FILELIST"
    (procedure "p.5.run" line 15)
    invoked from within
    "p.5.run"
    (file "/home/tcl/bin/dirlist-nix.tcl" line 188)


    The content is obviously the entire list of (almost) all files in my
    hard disk. It is not formatted data or even completely predictable data.
    Am I supposed to "escape" anything to make this work?

    No, you are supposed to handle it with a wee bit of knowledge of how
    Tcl's going to interpret it.

    OK. I'm going to try this:

    set ::FILELIST [join [lsort -dictionary [list $::FILELIST]] "\n"]

    It works. OK. But I'm not sure why.

    Because you created a list, containing a single element, that single
    element being the entire string you got back from your exec call.

    Why do I have to explicitly tell Tcl that $::FILELIST is a list

    You did not tell Tcl it was a list.

    , or rather, why do I have to convert it into a list?

    Because lsort works on lists and only lists. If you pass it a string
    (which is what you are doing), Tcl will make an attempt to convert the
    string to a list. But, if the string is not properly formatted (i.e.,
    certian delimiter characters not properly escaped) that attempt to
    convert to a list will fail, with the error message you got.

    This is why I keep cautioning you to: *do not use list operators on
    strings*. This is the error you get when you do that, except you don't
    always get it. Sometimes it stays hidden for years, until one day, a
    string with an unescaped delimiter passes through the code, and then,
    *boom* things blow up.

    I always assumed that Tcl shimmied automatically as needed.

    It does, but it makes assumptions on how the string being shimmered is formatted, if the string violates those assumptions, then you get this
    error message.

    Apparently, I've been wrong.

    Yep.

    Now, for your fix, there's three possible fixes. One, if you know for
    sure you will never have a newline in a filename (Linux allows
    filenames to contain newlines -- but usually one has to go out of one's
    way to create one) then you need to *convert* the *string* you receive
    from [exec] into a list, *before* you try to sort it.

    You convert it to a string using [split]. You'd need to first:

    [split $::FILELIST \n]

    to break the string in ::FILELIST into a proper list, using \n as the delimiter. Then lsort will sort the resulting, proper list, just fine.

    Now, if you want to be absolutely sure you never have problems when
    exec'ing 'find' to obtain filenames, you want to replace your "-print"
    with "-print0" which instructs find to output ASCII null's between each
    file instead of newlines. ASCII null is one of the very few characters
    that cannot be used in a Linux filename, so it can never occur in the
    data you receive from find.

    Then, immedately after receiving the giant string back from find in the
    exec call, convert it to a proper list using split, only use \0 as the delimiter instead of \n:

    set ::FILELIST [split $::FILELIST \0]

    I did a quick test here using 8.6.12 on Linux and the ASCII null's
    arrive from the [exec] call and therefore can be used as the delimiter
    for [split].

    The third fix is you switch to using Tcllib's fileutil module,
    specifically its ::fileutil::find call, creating your own 'filtercmd'
    to remove from what is found those directory trees you don't want to do anything with.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Luc on Mon May 1 00:56:12 2023
    Luc <[email protected]d> wrote:
    On Sun, 30 Apr 2023 23:30:50 -0000 (UTC), Rich wrote:

    No, you are supposed to handle it with a wee bit of knowledge of how
    Tcl's going to interpret it.


    Thank you again for the excellent reply as usual.

    You even managed to answer an additional question I was going to post,
    not needed anymore. That's just how good you are. :-)

    I think I understand everything you exposed except one minor detail that
    is piquing my curiosity:


    , or rather, why do I have to convert it into a list?

    Because lsort works on lists and only lists. If you pass it a string
    (which is what you are doing), Tcl will make an attempt to convert the
    string to a list. But, if the string is not properly formatted (i.e.,
    certian delimiter characters not properly escaped) that attempt to
    convert to a list will fail, with the error message you got.

    How come [list $whatever] works so well to fix it and Tcl automatic shimmering doesn't?

    Because [list $whatever] creates a single element list. It is
    analagous to your having done this:

    set array(0) $whatever

    The [list $whatever] construct takes the "thing" in $whatever (which in
    your original post was a string) and converts it into a list with one
    element.

    However if you just do:

    [lindex $whatever 5] and '$whatever' is not already a proper list, then
    Tcl tries to be helpful and makes an attempt to convert it to a list.
    The resulting list might have one elemnt, or it might have twenty
    thousand elements, or it might not convert at all. The error message
    you got is what happens when it can't convert. And the conversion Tcl
    tries to do is the inverse of the conversion it does when you shimmer a
    list into a string:

    $ rlwrap tclsh
    % set l [list # a \"\" b]
    {#} a {""} b
    % puts $l
    {#} a {""} b
    %

    Tcl's shimmer from string to list assumes the string has that special
    extra formatting above for special characters. If the string has
    nothing special character wise inside, then the shimmer is almost the
    same as [split $string]. Which is why it often seems to work during development, none of the test strings have the "special characters"
    inside, so it seems to just work. Then, a year later, someone sends
    through a string that is not formatted correctly, and things seem to
    blow up at random.

    Why doesn't Tcl just use the [list] method instead?

    Your [list $whatever] is not "converting $whatever to the list
    $whatever represents". It is "creating a new list, with the string
    $whatever as its only element. The difference between those two
    statemnts is critical to "getting" why mixing string operators with
    lists, or list operators on plain strings, is dangerous. The shimmer
    is trying to find 1..N elements inside $whatever, by looking through
    the characters inside, the [list $whatever] version never looks at what
    is inside of $whatever.

    The third fix is you switch to using Tcllib's fileutil module,
    specifically its ::fileutil::find call, creating your own 'filtercmd'
    to remove from what is found those directory trees you don't want to do
    anything with.

    I've tried that method (without any filters) and I was not happy with
    the performance. I may try it again, but I doubt it can be faster than 'find.'

    If you are fighting performance, then find, or simply not scanning
    direectory trees you don't want, will likely be faster. I.e., do
    something like (written here, not tested):

    set items [glob /*]
    set filtered [list]
    foreach item $items {
    if {[file isdirectory $item]} {
    # check to see if $item is one of /proc or /sys or /dev or
    # anything else you don't want, if so, do nothing with it
    continue
    }
    lappend filtered $item
    }

    # now build a 'find' command line out of contents of $filtered, or
    # use $filtered to run ::fileutil::find on the items that are of
    # interest

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to Rich on Sun Apr 30 21:39:13 2023
    On Sun, 30 Apr 2023 23:30:50 -0000 (UTC), Rich wrote:

    No, you are supposed to handle it with a wee bit of knowledge of how
    Tcl's going to interpret it.


    Thank you again for the excellent reply as usual.

    You even managed to answer an additional question I was going to post,
    not needed anymore. That's just how good you are. :-)

    I think I understand everything you exposed except one minor detail that
    is piquing my curiosity:


    , or rather, why do I have to convert it into a list?

    Because lsort works on lists and only lists. If you pass it a string
    (which is what you are doing), Tcl will make an attempt to convert the
    string to a list. But, if the string is not properly formatted (i.e., certian delimiter characters not properly escaped) that attempt to
    convert to a list will fail, with the error message you got.

    How come [list $whatever] works so well to fix it and Tcl automatic
    shimmering doesn't? Why doesn't Tcl just use the [list] method instead?
    Why can't airplanes be made of the same material as their black boxes?


    The third fix is you switch to using Tcllib's fileutil module,
    specifically its ::fileutil::find call, creating your own 'filtercmd'
    to remove from what is found those directory trees you don't want to do anything with.

    I've tried that method (without any filters) and I was not happy with
    the performance. I may try it again, but I doubt it can be faster than
    'find.'


    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to Rich on Sun Apr 30 22:16:53 2023
    On Mon, 1 May 2023 00:56:12 -0000 (UTC), Rich wrote:

    If you are fighting performance, then find, or simply not scanning
    direectory trees you don't want, will likely be faster. I.e., do
    something like (written here, not tested):

    I see your point, but I do that already with the 'find' parameters.

    set ::EXCEPTIONS {
    "/tmp/*" "/run/*" "/proc/*" "/sys/*"
    "/home/luc/Mail/*" "/var/ram/*" "/var/tmp/*"
    }

    set _commandline [list find $::SCANPATH -type f,d]
    if {$SCANPATH == "/"} {
    foreach i $EXCEPTIONS {
    lappend _commandline -path $i -prune -o
    }
    lappend _commandline -print
    }

    (already fixed according to your help)

    Is that functionally different from the code you just suggested?

    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Luc on Mon May 1 03:44:08 2023
    Luc <[email protected]d> wrote:
    On Mon, 1 May 2023 00:56:12 -0000 (UTC), Rich wrote:

    If you are fighting performance, then find, or simply not scanning
    direectory trees you don't want, will likely be faster. I.e., do
    something like (written here, not tested):

    I see your point, but I do that already with the 'find' parameters.

    set ::EXCEPTIONS {
    "/tmp/*" "/run/*" "/proc/*" "/sys/*"
    "/home/luc/Mail/*" "/var/ram/*" "/var/tmp/*"
    }

    set _commandline [list find $::SCANPATH -type f,d]
    if {$SCANPATH == "/"} {
    foreach i $EXCEPTIONS {
    lappend _commandline -path $i -prune -o
    }
    lappend _commandline -print
    }

    (already fixed according to your help)

    Is that functionally different from the code you just suggested?

    My code was an example of a way to do some 'prefiltering' for using
    Tcllib's find. But looking over your exceptions again now, you are
    excluding some directories that are several layers down, which would
    make the prefiltering much more troublesome vs. your use of GNU find
    and its prune option.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)