• When do we need "encoding system"

    From Alexandru@21:1/5 to All on Tue Aug 2 09:30:16 2022
    Recently I though it would be a good idea to add "encoding system utf-8" to my code. After that I realized that the icons of Windows folders in the treectrl package are not shown anymore, if the folder path contains special chars such as umlaute. So I
    must revert back. But when do we need this command anyway?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Alexandru on Tue Aug 2 18:16:23 2022
    Alexandru <[email protected]> wrote:
    Recently I though it would be a good idea to add "encoding system
    utf-8" to my code.

    Will only work right if the OS system call encoding is also UTF-8.

    After that I realized that the icons of Windows folders in the
    treectrl package are not shown anymore, if the folder path contains
    special chars such as umlaute. So I must revert back.

    Yup, expected, as windows system calls are likely largely still UTF-16.

    But when do we need this command anyway?

    When you need to change the encoding for a system call that accepts
    something other than the overall default for the rest. From the man page:

    encoding system ?encoding?
    Set the system encoding to encoding. If encoding is omitted then
    the command returns the current system encoding. The system
    encoding is used whenever Tcl passes strings to system calls.

    The key phrase is the last sentence.

    Overall, unless you are testing obscure things, it is probably best to
    leave the system encoding alone.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alexandru@21:1/5 to Rich on Tue Aug 2 13:42:39 2022
    Rich schrieb am Dienstag, 2. August 2022 um 21:16:27 UTC+3:
    Alexandru <[email protected]> wrote:
    Recently I though it would be a good idea to add "encoding system
    utf-8" to my code.
    Will only work right if the OS system call encoding is also UTF-8.
    After that I realized that the icons of Windows folders in the
    treectrl package are not shown anymore, if the folder path contains
    special chars such as umlaute. So I must revert back.
    Yup, expected, as windows system calls are likely largely still UTF-16.
    But when do we need this command anyway?
    When you need to change the encoding for a system call that accepts
    something other than the overall default for the rest. From the man page:

    encoding system ?encoding?
    Set the system encoding to encoding. If encoding is omitted then
    the command returns the current system encoding. The system
    encoding is used whenever Tcl passes strings to system calls.

    The key phrase is the last sentence.

    Overall, unless you are testing obscure things, it is probably best to
    leave the system encoding alone.

    Thanks Rich for the explanation.
    I think Windows uses cp1252.
    So it's a mess: I write files typically in utf-8, read them back in utf-8.
    All the application data is encoded in utf-8 although the system encoding is cp1252.
    E.g. when I use CAWT to read an Excel file, it's content is cp1252 but somehow this still works?

    Regards
    Alexandru

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Alexandru on Tue Aug 2 21:21:10 2022
    Alexandru <[email protected]> wrote:
    Rich schrieb am Dienstag, 2. August 2022 um 21:16:27 UTC+3:
    Alexandru <[email protected]> wrote:
    Recently I though it would be a good idea to add "encoding system
    utf-8" to my code.

    Overall, unless you are testing obscure things, it is probably best to
    leave the system encoding alone.

    Thanks Rich for the explanation.
    I think Windows uses cp1252.

    cp1252 is a font mapping, UTF-16 is an encoding - two different, but
    related, items. Font mappings define what characters each integer
    value represents (such as 65 meaning capital letter A in ASCII).
    Encodings are how the integers are stored in memory (in the case of
    UTF-16, as 16-bit integer values).

    So it's a mess: I write files typically in utf-8, read them back in
    utf-8.

    Yep, and most new work really should be in UTF-8, unless you need
    something else due to 'legacy'.

    All the application data is encoded in utf-8 although the system
    encoding is cp1252.

    Again, that legacy stuff... :)

    E.g. when I use CAWT to read an Excel file, it's content is cp1252
    but somehow this still works?

    Yes, because Tcl transparently converts it from cp1252 (and whatever
    encoding it is stored in) for you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Wed Aug 3 10:07:04 2022
    * Rich <[email protected]d>
    | Alexandru <[email protected]> wrote:
    | > Recently I though it would be a good idea to add "encoding system
    | > utf-8" to my code.

    | Will only work right if the OS system call encoding is also UTF-8.

    | > After that I realized that the icons of Windows folders in the
    | > treectrl package are not shown anymore, if the folder path contains
    | > special chars such as umlaute. So I must revert back.

    | Yup, expected, as windows system calls are likely largely still UTF-16.

    I'm not convinced that this is the real reason for that error.

    In my experience, the file handling functions on Windows don't care
    about the system encoding when it comes to the *name* of the file - they
    simply convert TCL's internal rep to wide char (win/tclWinFile.c:TclNativeCreateNativeRep(), using
    MultiByteToWideChar() from CP_UTF8).

    In contrast, the code on unix indeed uses the system encoding to get the
    file name to open (unix/tclUnixFile.c:TclNativeCreateNativeRep() uses Tcl_UtfToExternalDString(NULL,) where the NULL denotes the system
    encoding).

    I rather suspect that the file *reading* behind the scenes relies on the
    system encoding being 'correct'. That might fail if the system encoding
    ist set to utf-8, but the file content is not stored in utf-8.

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)