• Problem with filenames that include emoji characters

    From Michael Soyka@21:1/5 to All on Fri Sep 1 17:25:46 2023
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10 system. I recently received a collection of .eml files whose filenames include emoji characters. I assume these files were created by some email client such as Outlook. When emails
    are saved to a file the Subject line is used for the filename. I assume that this is how the filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get what I consider to be nonsensical errors. For example, the "open" command fails with the message "filename is invalid on this platform", even though the file does exist. On the
    other hand, various "file" commands that also take a filename argument, such as "exists" and "size", return "no such file or directory". Again, the file certainly does exist.

    I can confirm that the emoji characters in these filenames have the values \u01f495 and \u01f49e, "two hearts" and "rotating hearts". The filenames also include the characters "FADED LOVERS TOUR" so I suppose that justifies their inclusion. :)

    I haven't been able to construct such a filename using Tcl commands. Instead, I've used "glob" to get the filename from the filesystem (NTFS) and used the result as the argument for "open" and "file".

    I admit I'm inexperienced in things UTF-8, encodings and code pages but
    is this a bug to report or do I need to fill-in some gaps in my education?

    Thanks in advance for any comments,
    -mike

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Leitgeb@21:1/5 to Michael Soyka on Sat Sep 2 06:05:38 2023
    Michael Soyka <[email protected]> wrote:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    system. I recently received a collection of .eml files whose filenames include emoji characters. [...]

    As an immediate Anser, I'd suggest looking for Christian Werner's
    Undrowish (no typo, it's Androwish on Android-devices ported back
    to PC)

    Unicode-emojis are not well supported by standard Tcl 8.6.*
    There is work in progress about fixing it for 8.7 and 9.0.

    Undrowish has a "variant" of 8.6 that already now supports
    these to some degree at the cost, that not all extensions
    can be loaded into it. (others might be able to explain it
    better)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Soyka@21:1/5 to Andreas Leitgeb on Sat Sep 2 10:40:48 2023
    On 09/02/2023 2:05 AM, Andreas Leitgeb wrote:
    Michael Soyka <[email protected]> wrote:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    system. I recently received a collection of .eml files whose filenames
    include emoji characters. [...]

    As an immediate Anser, I'd suggest looking for Christian Werner's
    Undrowish (no typo, it's Androwish on Android-devices ported back
    to PC)

    Thank you for the information, I'll look into Undrowish.


    Unicode-emojis are not well supported by standard Tcl 8.6.*
    There is work in progress about fixing it for 8.7 and 9.0.

    So this a known issue and there's no reason to file a bug report.

    Undrowish has a "variant" of 8.6 that already now supports
    these to some degree at the cost, that not all extensions
    can be loaded into it. (others might be able to explain it
    better)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Sep 6 18:21:35 2023
    Am 02.09.2023 um 16:40 schrieb Michael Soyka:
    On 09/02/2023 2:05 AM, Andreas Leitgeb wrote:
    Michael Soyka <[email protected]> wrote:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    system.Β  I recently received a collection of .eml files whose filenames >>> include emoji characters.Β  [...]

    As an immediate Anser, I'd suggest looking for Christian Werner's
    UndrowishΒ  (no typo, it's Androwish on Android-devices ported back
    to PC)

    Thank you for the information, I'll look into Undrowish.


    Unicode-emojis are not well supported by standard Tcl 8.6.*
    There is work in progress about fixing it for 8.7 and 9.0.

    So this a known issue and there's no reason to file a bug report.

    Well, we recently had a fix for this issue on Linux (TIP671).
    The argument not to fix it was, that there are no bug reports on it ;-).

    Take care,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rolf Ade@21:1/5 to Michael Soyka on Thu Sep 7 00:16:03 2023
    Michael Soyka <[email protected]> writes:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    [...]
    filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get
    what I consider to be nonsensical errors. For example, the "open"
    command fails with the message "filename is invalid on this platform",
    even though the file does exist. On the other hand, various "file"
    commands that also take a filename argument, such as "exists" and
    "size", return "no such file or directory". Again, the file certainly
    does exist.

    You haven't shown us how you call that commands in Tcl, with the emoji
    literal in the source code or escaped as \Uxxxxx, for example and what
    encoding your source file has.

    Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there is
    no problem in handling such filenames (with unicode code points in
    proper utf.8 in it as emojis).

    See for example:

    # The following is: a\U1f972
    set fd [open aπŸ₯² w+]
    # \U1f926
    puts $fd 🀦
    close $fd

    set fd [open aπŸ₯²]
    puts [read $fd]
    close $fd

    This script works for me on linux with 8.6.10, 8.6.13 and 9. Though this
    is on linux.

    I haven't been able to construct such a filename using Tcl commands.
    Instead, I've used "glob" to get the filename from the filesystem
    (NTFS) and used the result as the argument for "open" and "file".

    So you can construct the filenames with results of Tcl commands and successfully open the files?

    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Soyka@21:1/5 to Rolf Ade on Thu Sep 7 16:11:33 2023
    On 09/06/2023 6:16 PM, Rolf Ade wrote:

    Michael Soyka <[email protected]> writes:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    [...]
    filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get
    what I consider to be nonsensical errors. For example, the "open"
    command fails with the message "filename is invalid on this platform",
    even though the file does exist. On the other hand, various "file"
    commands that also take a filename argument, such as "exists" and
    "size", return "no such file or directory". Again, the file certainly
    does exist.

    You haven't shown us how you call that commands in Tcl, with the emoji literal in the source code or escaped as \Uxxxxx, for example and what encoding your source file has.

    The filenames were obtained using the "glob" command. The files
    themselves were created, I believe, by others using a mail client on
    Windows.

    Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there is
    no problem in handling such filenames (with unicode code points in
    proper utf.8 in it as emojis).

    See for example:

    # The following is: a\U1f972
    set fd [open aπŸ₯² w+]
    # \U1f926
    puts $fd 🀦
    close $fd

    set fd [open aπŸ₯²]
    puts [read $fd]
    close $fd

    This script works for me on linux with 8.6.10, 8.6.13 and 9. Though this
    is on linux.

    I haven't been able to construct such a filename using Tcl commands.
    Instead, I've used "glob" to get the filename from the filesystem
    (NTFS) and used the result as the argument for "open" and "file".

    So you can construct the filenames with results of Tcl commands and successfully open the files?

    The only reason I tried to create a file that includes emoji characters
    in its name was to investigate the contradictory responses I was getting
    from the "open" and "file" commands.

    However, that's not the primary issue I tried to raise so I'll try to be
    more specific.

    I was given a collection of files on a thumb drive. One of the files
    contains a sequence of three emoji characters in its name: "two hearts", "revolving hearts" and "two hearts". The corresponding unicode values
    are \U01F495 and \U01F49E. One of the reasons I believe this is based
    on the following code:

    proc DisplayCharCodes {string} {
    foreach c [split $string {}] {
    puts [format {%s: %#x} $c [scan $c %c]]
    }
    }
    set fileList [glob -type f *.eml]
    set filename [lindex $fileList 1]
    DisplayCharCodes $filename

    which outputs the following:

    N: 0x4e
    E: 0x45
    X: 0x58
    T: 0x54
    : 0x20
    S: 0x53
    A: 0x41
    T: 0x54
    .: 0x2e
    : 0x20
    2: 0x32
    _: 0x5f
    1: 0x31
    5: 0x35
    _: 0x5f
    : 0x20
    F: 0x46
    A: 0x41
    D: 0x44
    E: 0x45
    D: 0x44
    : 0x20
    L: 0x4c
    O: 0x4f
    V: 0x56
    E: 0x45
    R: 0x52
    S: 0x53
    : 0x20
    T: 0x54
    O: 0x4f
    U: 0x55
    R: 0x52
    : 0x20
    i: 0x69
    n: 0x6e
    : 0x20
    P: 0x50
    R: 0x52
    O: 0x4f
    V: 0x56
    I: 0x49
    D: 0x44
    E: 0x45
    N: 0x4e
    C: 0x43
    E: 0x45
    !: 0x21
    : 0x20
    πŸ’•: 0x1f495
    πŸ’ž: 0x1f49e
    πŸ’•: 0x1f495
    .: 0x2e
    e: 0x65
    m: 0x6d
    l: 0x6c

    Given the above, this is what "open" returns:

    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

    and the response of "file exists $filename" is zero.

    So I'm looking for a reason behind this inconsistent and, in my mind, nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of
    both and/or something else?

    I hope the above clarifies my problem.

    -mike


    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rolf Ade@21:1/5 to Michael Soyka on Fri Sep 8 02:15:08 2023
    Michael Soyka writes:
    On 09/06/2023 6:16 PM, Rolf Ade wrote:
    Michael Soyka <[email protected]> writes:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    [...]
    filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get
    what I consider to be nonsensical errors. For example, the "open"
    command fails with the message "filename is invalid on this platform",
    even though the file does exist. On the other hand, various "file"
    commands that also take a filename argument, such as "exists" and
    "size", return "no such file or directory". Again, the file certainly
    does exist.
    You haven't shown us how you call that commands in Tcl, with the
    emoji
    literal in the source code or escaped as \Uxxxxx, for example and what
    encoding your source file has.

    The filenames were obtained using the "glob" command. The files
    themselves were created, I believe, by others using a mail client on
    Windows.
    Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there
    is
    no problem in handling such filenames (with unicode code points in
    proper utf.8 in it as emojis).
    See for example:
    # The following is: a\U1f972
    set fd [open aπŸ₯² w+]
    # \U1f926
    puts $fd 🀦
    close $fd
    set fd [open aπŸ₯²]
    puts [read $fd]
    close $fd
    This script works for me on linux with 8.6.10, 8.6.13 and 9. Though
    this
    is on linux.

    I haven't been able to construct such a filename using Tcl commands.
    Instead, I've used "glob" to get the filename from the filesystem
    (NTFS) and used the result as the argument for "open" and "file".
    So you can construct the filenames with results of Tcl commands and
    successfully open the files?

    The only reason I tried to create a file that includes emoji
    characters in its name was to investigate the contradictory responses
    I was getting from the "open" and "file" commands.

    However, that's not the primary issue I tried to raise so I'll try to
    be more specific.

    I was given a collection of files on a thumb drive. One of the files contains a sequence of three emoji characters in its name: "two
    hearts", "revolving hearts" and "two hearts". The corresponding
    unicode values are \U01F495 and \U01F49E. One of the reasons I
    believe this is based on the following code:

    proc DisplayCharCodes {string} {
    foreach c [split $string {}] {
    puts [format {%s: %#x} $c [scan $c %c]]
    }
    }
    set fileList [glob -type f *.eml]
    set filename [lindex $fileList 1]
    DisplayCharCodes $filename

    which outputs the following:

    N: 0x4e
    E: 0x45
    X: 0x58
    T: 0x54
    : 0x20
    S: 0x53
    A: 0x41
    T: 0x54
    .: 0x2e
    : 0x20
    2: 0x32
    _: 0x5f
    1: 0x31
    5: 0x35
    _: 0x5f
    : 0x20
    F: 0x46
    A: 0x41
    D: 0x44
    E: 0x45
    D: 0x44
    : 0x20
    L: 0x4c
    O: 0x4f
    V: 0x56
    E: 0x45
    R: 0x52
    S: 0x53
    : 0x20
    T: 0x54
    O: 0x4f
    U: 0x55
    R: 0x52
    : 0x20
    i: 0x69
    n: 0x6e
    : 0x20
    P: 0x50
    R: 0x52
    O: 0x4f
    V: 0x56
    I: 0x49
    D: 0x44
    E: 0x45
    N: 0x4e
    C: 0x43
    E: 0x45
    !: 0x21
    : 0x20
    πŸ’•: 0x1f495
    πŸ’ž: 0x1f49e
    πŸ’•: 0x1f495
    .: 0x2e
    e: 0x65
    m: 0x6d
    l: 0x6c

    Given the above, this is what "open" returns:

    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
    πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

    and the response of "file exists $filename" is zero.

    So I'm looking for a reason behind this inconsistent and, in my mind, nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of
    both and/or something else?

    I hope the above clarifies my problem.

    Thanks.

    Yes, typically you should be able to use any file name returned by glob
    as argument for open or file exists. There is an exception of that rule
    (what Harald mattered) and that may be in place here.

    Can you open the file in question with the file explorer? Perhaps you
    can truncate it and provide it as download somewhere (in the hope that
    the "strangeness" of the file name survives this actions, which is not a given)?

    The one known scenario which shows what you describe (you can't open a
    filename you got from glob) is: the file names are written in another
    encoding then what the system use for its filenames. Though, in what you presented as results of your own investigations I cannot see indication
    that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or how
    they are used). At least you are right in saying this is strange and
    need an explanation. If it's not the thing from above.

    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Soyka@21:1/5 to Rolf Ade on Fri Sep 8 14:49:48 2023
    On 09/07/2023 8:15 PM, Rolf Ade wrote:

    Michael Soyka writes:
    On 09/06/2023 6:16 PM, Rolf Ade wrote:
    Michael Soyka <[email protected]> writes:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    [...]
    filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get
    what I consider to be nonsensical errors. For example, the "open"
    command fails with the message "filename is invalid on this platform", >>>> even though the file does exist. On the other hand, various "file"
    commands that also take a filename argument, such as "exists" and
    "size", return "no such file or directory". Again, the file certainly
    does exist.
    You haven't shown us how you call that commands in Tcl, with the
    emoji
    literal in the source code or escaped as \Uxxxxx, for example and what
    encoding your source file has.

    The filenames were obtained using the "glob" command. The files
    themselves were created, I believe, by others using a mail client on
    Windows.
    Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there
    is
    no problem in handling such filenames (with unicode code points in
    proper utf.8 in it as emojis).
    See for example:
    # The following is: a\U1f972
    set fd [open aπŸ₯² w+]
    # \U1f926
    puts $fd 🀦
    close $fd
    set fd [open aπŸ₯²]
    puts [read $fd]
    close $fd
    This script works for me on linux with 8.6.10, 8.6.13 and 9. Though
    this
    is on linux.

    I haven't been able to construct such a filename using Tcl commands.
    Instead, I've used "glob" to get the filename from the filesystem
    (NTFS) and used the result as the argument for "open" and "file".
    So you can construct the filenames with results of Tcl commands and
    successfully open the files?

    The only reason I tried to create a file that includes emoji
    characters in its name was to investigate the contradictory responses
    I was getting from the "open" and "file" commands.

    However, that's not the primary issue I tried to raise so I'll try to
    be more specific.

    I was given a collection of files on a thumb drive. One of the files
    contains a sequence of three emoji characters in its name: "two
    hearts", "revolving hearts" and "two hearts". The corresponding
    unicode values are \U01F495 and \U01F49E. One of the reasons I
    believe this is based on the following code:

    proc DisplayCharCodes {string} {
    foreach c [split $string {}] {
    puts [format {%s: %#x} $c [scan $c %c]]
    }
    }
    set fileList [glob -type f *.eml]
    set filename [lindex $fileList 1]
    DisplayCharCodes $filename

    which outputs the following:

    N: 0x4e
    E: 0x45
    X: 0x58
    T: 0x54
    : 0x20
    S: 0x53
    A: 0x41
    T: 0x54
    .: 0x2e
    : 0x20
    2: 0x32
    _: 0x5f
    1: 0x31
    5: 0x35
    _: 0x5f
    : 0x20
    F: 0x46
    A: 0x41
    D: 0x44
    E: 0x45
    D: 0x44
    : 0x20
    L: 0x4c
    O: 0x4f
    V: 0x56
    E: 0x45
    R: 0x52
    S: 0x53
    : 0x20
    T: 0x54
    O: 0x4f
    U: 0x55
    R: 0x52
    : 0x20
    i: 0x69
    n: 0x6e
    : 0x20
    P: 0x50
    R: 0x52
    O: 0x4f
    V: 0x56
    I: 0x49
    D: 0x44
    E: 0x45
    N: 0x4e
    C: 0x43
    E: 0x45
    !: 0x21
    : 0x20
    πŸ’•: 0x1f495
    πŸ’ž: 0x1f49e
    πŸ’•: 0x1f495
    .: 0x2e
    e: 0x65
    m: 0x6d
    l: 0x6c

    Given the above, this is what "open" returns:

    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
    πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

    and the response of "file exists $filename" is zero.

    So I'm looking for a reason behind this inconsistent and, in my mind,
    nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of
    both and/or something else?

    I hope the above clarifies my problem.

    Thanks.

    Yes, typically you should be able to use any file name returned by glob
    as argument for open or file exists. There is an exception of that rule
    (what Harald mattered) and that may be in place here.

    Can you open the file in question with the file explorer? Perhaps you
    can truncate it and provide it as download somewhere (in the hope that
    the "strangeness" of the file name survives this actions, which is not a given)?

    Yes, using Windows Explorer I can open the file with Vim and open the
    file with Outlook. I can also rename the file, deleting the 3 emoji characters, and open it using the Tcl commands "glob" and "open".


    The one known scenario which shows what you describe (you can't open a filename you got from glob) is: the file names are written in another encoding then what the system use for its filenames. Though, in what you presented as results of your own investigations I cannot see indication
    that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or how
    they are used). At least you are right in saying this is strange and
    need an explanation. If it's not the thing from above.

    I've since copied the files from the same thumb drive onto my linux
    system and retried the "glob" and "open" commands using 8.6.10- it all
    works. My Windows version is 8.6.12, a later version, so it appears
    that my problems are peculiar to Windows.

    Thanks for your continuing interest- it's helped motivate me to look
    deeper into the problem.

    -mike


    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Heller@21:1/5 to [email protected] on Fri Sep 8 19:52:04 2023
    At Fri, 8 Sep 2023 14:49:48 -0400 Michael Soyka <[email protected]> wrote:


    On 09/07/2023 8:15 PM, Rolf Ade wrote:

    Michael Soyka writes:
    On 09/06/2023 6:16 PM, Rolf Ade wrote:
    Michael Soyka <[email protected]> writes:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
    [...]
    filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get >>>> what I consider to be nonsensical errors. For example, the "open"
    command fails with the message "filename is invalid on this platform", >>>> even though the file does exist. On the other hand, various "file"
    commands that also take a filename argument, such as "exists" and
    "size", return "no such file or directory". Again, the file certainly >>>> does exist.
    You haven't shown us how you call that commands in Tcl, with the
    emoji
    literal in the source code or escaped as \Uxxxxx, for example and what >>> encoding your source file has.

    The filenames were obtained using the "glob" command. The files
    themselves were created, I believe, by others using a mail client on
    Windows.
    Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there
    is
    no problem in handling such filenames (with unicode code points in
    proper utf.8 in it as emojis).
    See for example:
    # The following is: a\U1f972
    set fd [open aΓƒΒ°Γ‚ΒŸΓ‚Β₯² w+]
    # \U1f926
    puts $fd ΓƒΒ°Γ‚ΒŸΓ‚Β€Γ‚Β¦
    close $fd
    set fd [open aΓƒΒ°Γ‚ΒŸΓ‚Β₯²]
    puts [read $fd]
    close $fd
    This script works for me on linux with 8.6.10, 8.6.13 and 9. Though
    this
    is on linux.

    I haven't been able to construct such a filename using Tcl commands. >>>> Instead, I've used "glob" to get the filename from the filesystem
    (NTFS) and used the result as the argument for "open" and "file".
    So you can construct the filenames with results of Tcl commands and
    successfully open the files?

    The only reason I tried to create a file that includes emoji
    characters in its name was to investigate the contradictory responses
    I was getting from the "open" and "file" commands.

    However, that's not the primary issue I tried to raise so I'll try to
    be more specific.

    I was given a collection of files on a thumb drive. One of the files
    contains a sequence of three emoji characters in its name: "two
    hearts", "revolving hearts" and "two hearts". The corresponding
    unicode values are \U01F495 and \U01F49E. One of the reasons I
    believe this is based on the following code:

    proc DisplayCharCodes {string} {
    foreach c [split $string {}] {
    puts [format {%s: %#x} $c [scan $c %c]]
    }
    }
    set fileList [glob -type f *.eml]
    set filename [lindex $fileList 1]
    DisplayCharCodes $filename

    which outputs the following:

    N: 0x4e
    E: 0x45
    X: 0x58
    T: 0x54
    : 0x20
    S: 0x53
    A: 0x41
    T: 0x54
    .: 0x2e
    : 0x20
    2: 0x32
    _: 0x5f
    1: 0x31
    5: 0x35
    _: 0x5f
    : 0x20
    F: 0x46
    A: 0x41
    D: 0x44
    E: 0x45
    D: 0x44
    : 0x20
    L: 0x4c
    O: 0x4f
    V: 0x56
    E: 0x45
    R: 0x52
    S: 0x53
    : 0x20
    T: 0x54
    O: 0x4f
    U: 0x55
    R: 0x52
    : 0x20
    i: 0x69
    n: 0x6e
    : 0x20
    P: 0x50
    R: 0x52
    O: 0x4f
    V: 0x56
    I: 0x49
    D: 0x44
    E: 0x45
    N: 0x4e
    C: 0x43
    E: 0x45
    !: 0x21
    : 0x20
    ΓƒΒ°Γ‚ΒŸΓ‚Β’Γ‚Β•: 0x1f495
    ΓƒΒ°Γ‚ΒŸΓ‚Β’Γ‚Βž: 0x1f49e
    ΓƒΒ°Γ‚ΒŸΓ‚Β’Γ‚Β•: 0x1f495

    Noticing that these are 16-bit characters...

    .: 0x2e
    e: 0x65
    m: 0x6d
    l: 0x6c

    Given the above, this is what "open" returns:

    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
    ΓƒΒ°Γ‚ΒŸΓ‚Β’Γ‚Β•ΓƒΒ°Γ‚ΒŸΓ‚Β’Γ‚ΒžΓƒΒ°Γ‚ΒŸΓ‚Β’Γ‚Β•.eml": filename is invalid on this platform

    and the response of "file exists $filename" is zero.

    So I'm looking for a reason behind this inconsistent and, in my mind,
    nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of
    both and/or something else?

    I hope the above clarifies my problem.

    Thanks.

    Yes, typically you should be able to use any file name returned by glob
    as argument for open or file exists. There is an exception of that rule (what Harald mattered) and that may be in place here.

    Can you open the file in question with the file explorer? Perhaps you
    can truncate it and provide it as download somewhere (in the hope that
    the "strangeness" of the file name survives this actions, which is not a given)?

    Yes, using Windows Explorer I can open the file with Vim and open the
    file with Outlook. I can also rename the file, deleting the 3 emoji characters, and open it using the Tcl commands "glob" and "open".


    The one known scenario which shows what you describe (you can't open a filename you got from glob) is: the file names are written in another encoding then what the system use for its filenames. Though, in what you presented as results of your own investigations I cannot see indication that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or how they are used). At least you are right in saying this is strange and
    need an explanation. If it's not the thing from above.

    I've since copied the files from the same thumb drive onto my linux
    system and retried the "glob" and "open" commands using 8.6.10- it all
    works. My Windows version is 8.6.12, a later version, so it appears
    that my problems are peculiar to Windows.

    What does DisplayCharCodes display under Linux? Do the emoji chars display as 16-bit chars or two 8-bit characters?


    Thanks for your continuing interest- it's helped motivate me to look
    deeper into the problem.

    -mike


    rolf




    --
    Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
    Deepwoods Software -- Custom Software Services
    http://www.deepsoft.com/ -- Linux Administration Services
    [email protected] -- Webhosting Services

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Soyka@21:1/5 to Robert Heller on Fri Sep 8 16:15:12 2023
    On 09/08/2023 3:52 PM, Robert Heller wrote:
    At Fri, 8 Sep 2023 14:49:48 -0400 Michael Soyka <[email protected]> wrote:


    On 09/07/2023 8:15 PM, Rolf Ade wrote:

    Michael Soyka writes:
    On 09/06/2023 6:16 PM, Rolf Ade wrote:
    Michael Soyka <[email protected]> writes:
    I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10 >>>>>> [...]
    filenames came to include emoji characters.

    Now to the problem. When I try to access these files using Tcl, I get >>>>>> what I consider to be nonsensical errors. For example, the "open"
    command fails with the message "filename is invalid on this platform", >>>>>> even though the file does exist. On the other hand, various "file" >>>>>> commands that also take a filename argument, such as "exists" and
    "size", return "no such file or directory". Again, the file certainly >>>>>> does exist.
    You haven't shown us how you call that commands in Tcl, with the
    emoji
    literal in the source code or escaped as \Uxxxxx, for example and what >>>>> encoding your source file has.

    The filenames were obtained using the "glob" command. The files
    themselves were created, I believe, by others using a mail client on
    Windows.
    Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there >>>>> is
    no problem in handling such filenames (with unicode code points in
    proper utf.8 in it as emojis).
    See for example:
    # The following is: a\U1f972
    set fd [open aΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚Β₯Γƒβ€šΓ‚Β² w+]
    # \U1f926
    puts $fd ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚Β€Γƒβ€šΓ‚Β¦
    close $fd
    set fd [open aΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚Β₯Γƒβ€šΓ‚Β²]
    puts [read $fd]
    close $fd
    This script works for me on linux with 8.6.10, 8.6.13 and 9. Though
    this
    is on linux.

    I haven't been able to construct such a filename using Tcl commands. >>>>>> Instead, I've used "glob" to get the filename from the filesystem
    (NTFS) and used the result as the argument for "open" and "file".
    So you can construct the filenames with results of Tcl commands and
    successfully open the files?

    The only reason I tried to create a file that includes emoji
    characters in its name was to investigate the contradictory responses
    I was getting from the "open" and "file" commands.

    However, that's not the primary issue I tried to raise so I'll try to
    be more specific.

    I was given a collection of files on a thumb drive. One of the files
    contains a sequence of three emoji characters in its name: "two
    hearts", "revolving hearts" and "two hearts". The corresponding
    unicode values are \U01F495 and \U01F49E. One of the reasons I
    believe this is based on the following code:

    proc DisplayCharCodes {string} {
    foreach c [split $string {}] {
    puts [format {%s: %#x} $c [scan $c %c]]
    }
    }
    set fileList [glob -type f *.eml]
    set filename [lindex $fileList 1]
    DisplayCharCodes $filename

    which outputs the following:

    N: 0x4e
    E: 0x45
    X: 0x58
    T: 0x54
    : 0x20
    S: 0x53
    A: 0x41
    T: 0x54
    .: 0x2e
    : 0x20
    2: 0x32
    _: 0x5f
    1: 0x31
    5: 0x35
    _: 0x5f
    : 0x20
    F: 0x46
    A: 0x41
    D: 0x44
    E: 0x45
    D: 0x44
    : 0x20
    L: 0x4c
    O: 0x4f
    V: 0x56
    E: 0x45
    R: 0x52
    S: 0x53
    : 0x20
    T: 0x54
    O: 0x4f
    U: 0x55
    R: 0x52
    : 0x20
    i: 0x69
    n: 0x6e
    : 0x20
    P: 0x50
    R: 0x52
    O: 0x4f
    V: 0x56
    I: 0x49
    D: 0x44
    E: 0x45
    N: 0x4e
    C: 0x43
    E: 0x45
    !: 0x21
    : 0x20
    ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’: 0x1f495
    ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚ΕΎ: 0x1f49e
    ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’: 0x1f495

    Noticing that these are 16-bit characters...

    This doesn't look what I see in my posts which are the characters
    themselves.


    .: 0x2e
    e: 0x65
    m: 0x6d
    l: 0x6c

    Given the above, this is what "open" returns:

    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
    ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚ΕΎΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’.eml": filename is invalid on this platform

    and the response of "file exists $filename" is zero.

    So I'm looking for a reason behind this inconsistent and, in my mind,
    nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of >>>> both and/or something else?

    I hope the above clarifies my problem.

    Thanks.

    Yes, typically you should be able to use any file name returned by glob
    as argument for open or file exists. There is an exception of that rule
    (what Harald mattered) and that may be in place here.

    Can you open the file in question with the file explorer? Perhaps you
    can truncate it and provide it as download somewhere (in the hope that
    the "strangeness" of the file name survives this actions, which is not a >>> given)?

    Yes, using Windows Explorer I can open the file with Vim and open the
    file with Outlook. I can also rename the file, deleting the 3 emoji
    characters, and open it using the Tcl commands "glob" and "open".


    The one known scenario which shows what you describe (you can't open a
    filename you got from glob) is: the file names are written in another
    encoding then what the system use for its filenames. Though, in what you >>> presented as results of your own investigations I cannot see indication
    that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or how
    they are used). At least you are right in saying this is strange and
    need an explanation. If it's not the thing from above.

    I've since copied the files from the same thumb drive onto my linux
    system and retried the "glob" and "open" commands using 8.6.10- it all
    works. My Windows version is 8.6.12, a later version, so it appears
    that my problems are peculiar to Windows.

    What does DisplayCharCodes display under Linux? Do the emoji chars display as
    16-bit chars or two 8-bit characters?

    I see the emoji characters themselves followed by the 24-bit value.
    Just to be clear though, by "emoji characters themselves" I mean as they
    are displayed as in my earlier post, not as they are displayed above.

    If I pipe the filename into a file and octal dump the file, I see these
    byte values (octal) where the emoji characters are:

    360 237 222 225 360 237 222 236 360 237 222 225

    which looks like UTF-8 encoding to me. Microsoft claims it uses UTF-16
    for its filenames so I'd guess the end-result is the same.



    Thanks for your continuing interest- it's helped motivate me to look
    deeper into the problem.

    -mike


    rolf





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rolf Ade@21:1/5 to Michael Soyka on Sat Sep 9 00:28:18 2023
    Michael Soyka writes:
    On 09/07/2023 8:15 PM, Rolf Ade wrote:
    The one known scenario which shows what you describe (you can't open
    a filename you got from glob) is: the file names are written in
    another encoding then what the system use for its filenames. Though,
    in what you presented as results of your own investigations I cannot
    see indication that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or
    how they are used). At least you are right in saying this is strange
    and need an explanation. If it's not the thing from above.

    I've since copied the files from the same thumb drive onto my linux
    system and retried the "glob" and "open" commands using 8.6.10- it all
    works. My Windows version is 8.6.12, a later version, so it appears
    that my problems are peculiar to Windows.

    For the record: I also saw the emojis as character glyph, they are just ordinary unicode code points in utf-8 encodings; your system should be
    able to handle this and for sure Tcl should be able to handle this.

    Should be easy for listening Windows user to test. The file name in
    question is:

    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml

    Use the file explorer to create a file with that name. Then use Tcl
    8.6.12- and look what glob returns for the directory with the file in
    it. Then try to open the file name returned from glob and try file
    exists.

    On linux this all works well. I used emacs to create a file with the
    name from above in an otherwise empty directory. Then, in an interactiv
    tclsh session:

    glob *
    {NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml}
    set filename [lindex [glob *] 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    set fd [open $filename]
    file3
    close $fd
    file exists $filename
    1

    Mike, can you reproduce the issue with this recipt?

    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Soyka@21:1/5 to Rolf Ade on Fri Sep 8 20:40:41 2023
    On 09/08/2023 6:28 PM, Rolf Ade wrote:

    Michael Soyka writes:
    On 09/07/2023 8:15 PM, Rolf Ade wrote:
    The one known scenario which shows what you describe (you can't open
    a filename you got from glob) is: the file names are written in
    another encoding then what the system use for its filenames. Though,
    in what you presented as results of your own investigations I cannot
    see indication that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or
    how they are used). At least you are right in saying this is strange
    and need an explanation. If it's not the thing from above.

    I've since copied the files from the same thumb drive onto my linux
    system and retried the "glob" and "open" commands using 8.6.10- it all
    works. My Windows version is 8.6.12, a later version, so it appears
    that my problems are peculiar to Windows.

    For the record: I also saw the emojis as character glyph, they are just ordinary unicode code points in utf-8 encodings; your system should be
    able to handle this and for sure Tcl should be able to handle this.

    Should be easy for listening Windows user to test. The file name in
    question is:

    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml

    Use the file explorer to create a file with that name. Then use Tcl
    8.6.12- and look what glob returns for the directory with the file in
    it. Then try to open the file name returned from glob and try file
    exists.

    On linux this all works well. I used emacs to create a file with the
    name from above in an otherwise empty directory. Then, in an interactiv
    tclsh session:

    glob *
    {NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml}
    set filename [lindex [glob *] 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    set fd [open $filename]
    file3
    close $fd
    file exists $filename
    1

    Mike, can you reproduce the issue with this recipt?

    Assuming you meant running this in my Windows box, the answer is no- it
    still fails in exactly the same way in a new, empty directory:

    % set filename [lindex [glob -type f *] 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

    I haven't included the output from DisplayCharCodes but I promise it is
    the same as shown way-back above.

    Aside for the benefit of other readers.
    Entering the emoji characters in Windows Explorer using the keyboard did
    not work (I tried several methods). I had to create the characters in
    Wordpad and paste them into Windows Explorer while renaming the file.


    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Sat Sep 9 10:09:29 2023
    Am 09.09.2023 um 00:28 schrieb Rolf Ade:

    Michael Soyka writes:
    On 09/07/2023 8:15 PM, Rolf Ade wrote:
    The one known scenario which shows what you describe (you can't open
    a filename you got from glob) is: the file names are written in
    another encoding then what the system use for its filenames. Though,
    in what you presented as results of your own investigations I cannot
    see indication that this is the case here.

    But perhaps it's in fact a strangeness of the used windows APIs (or
    how they are used). At least you are right in saying this is strange
    and need an explanation. If it's not the thing from above.

    I've since copied the files from the same thumb drive onto my linux
    system and retried the "glob" and "open" commands using 8.6.10- it all
    works. My Windows version is 8.6.12, a later version, so it appears
    that my problems are peculiar to Windows.

    For the record: I also saw the emojis as character glyph, they are just ordinary unicode code points in utf-8 encodings; your system should be
    able to handle this and for sure Tcl should be able to handle this.

    Should be easy for listening Windows user to test. The file name in
    question is:

    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml

    Use the file explorer to create a file with that name. Then use Tcl
    8.6.12- and look what glob returns for the directory with the file in
    it. Then try to open the file name returned from glob and try file
    exists.

    On linux this all works well. I used emacs to create a file with the
    name from above in an otherwise empty directory. Then, in an interactiv
    tclsh session:

    glob *
    {NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml}
    set filename [lindex [glob *] 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    set fd [open $filename]
    file3
    close $fd
    file exists $filename
    1

    Mike, can you reproduce the issue with this recipt?

    rolf

    I can confirm to be able to reproduce:
    I am in a folder with only this file in c:\test NTFS file system

    % set l [glob *]
    {NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml}
    % set f [lindex $l 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    % file exists $f
    0
    % open $f r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform
    % info patchlevel
    8.6.13

    This is TCL8.6.13 32 bit self compiled with MS-VC6.

    Take care,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rolf Ade@21:1/5 to Michael Soyka on Sat Sep 9 13:07:26 2023
    Michael Soyka writes:
    On 09/08/2023 6:28 PM, Rolf Ade wrote:
    Should be easy for listening Windows user to test. The file name in
    question is:

    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml

    Use the file explorer to create a file with that name. Then use Tcl
    8.6.12- and look what glob returns for the directory with the file in
    it. Then try to open the file name returned from glob and try file
    exists.
    On linux this all works well. [...]

    Mike, can you reproduce the issue with this recipt?

    Assuming you meant running this in my Windows box, the answer is no-
    it still fails in exactly the same way in a new, empty directory:

    % set filename [lindex [glob -type f *] 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

    So you can reproduce the issue with this.

    Since Harald confirmed it is high time for a bug report. Please open a
    ticket on https://core.tcl-lang.org/tcl. What you try to do should work and does work on linux; this looks like a windows platform issue.

    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Sat Sep 9 14:15:36 2023
    Am 09.09.2023 um 13:07 schrieb Rolf Ade:
    Michael Soyka writes:
    On 09/08/2023 6:28 PM, Rolf Ade wrote:
    Should be easy for listening Windows user to test. The file name in
    question is:

    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml

    Use the file explorer to create a file with that name. Then use Tcl
    8.6.12- and look what glob returns for the directory with the file in
    it. Then try to open the file name returned from glob and try file
    exists.
    On linux this all works well. [...]

    Mike, can you reproduce the issue with this recipt?

    Assuming you meant running this in my Windows box, the answer is no-
    it still fails in exactly the same way in a new, empty directory:

    % set filename [lindex [glob -type f *] 0]
    NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
    % open $filename r
    couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
    πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

    So you can reproduce the issue with this.

    Since Harald confirmed it is high time for a bug report. Please open a
    ticket on https://core.tcl-lang.org/tcl. What you try to do should work and does work on linux; this looks like a windows platform issue.

    rolf

    Done here:

    https://core.tcl-lang.org/tcl/info/43b065660532eb4a

    Please continue the discussion there !

    Thank you all,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)