• Save/load irregular Chinese Characters

    From WJG@21:1/5 to All on Wed Apr 5 00:20:27 2023
    By and large I have no problems working with Tcl and unicode strings of Chinese characters. However, certain variant forms become garbled when saving. How should I configure my output stream to overcome this? The particular character is U+21E40.

    https://ctext.org/dictionary.pl?if=en&char=%F0%A1%B9%80

    Many thanks in advance for any comments.

    W.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to WJG on Wed Apr 5 12:20:25 2023
    On 4/5/2023 3:20 AM, WJG wrote:
    By and large I have no problems working with Tcl and unicode strings of Chinese characters. However, certain variant forms become garbled when saving. How should I configure my output stream to overcome this? The particular character is U+21E40.

    https://ctext.org/dictionary.pl?if=en&char=%F0%A1%B9%80

    Many thanks in advance for any comments.

    W.



    This works for me:

    # writing it out:
    set f [open $path wb]
    fconfigure $f -encoding utf-8 -translation auto
    puts $f {Hello garbled character "𡹀"!}
    close $f


    # reading it in:
    set f [open $path rb]
    fconfigure $f -encoding utf-8 -translation auto
    read $f
    close $f

    This prints out the following line:
    Hello garbled character "𡹀"!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Wed Apr 5 19:00:00 2023
    * saitology9 <[email protected]>
    | On 4/5/2023 3:20 AM, WJG wrote:
    | > By and large I have no problems working with Tcl and unicode strings
    | > of Chinese characters. However, certain variant forms become garbled
    | > when saving. How should I configure my output stream to overcome
    | > this? The particular character is U+21E40.
    | > https://ctext.org/dictionary.pl?if=en&char=%F0%A1%B9%80
    | > Many thanks in advance for any comments.
    | > W.

    | This works for me:

    | # writing it out:
    | set f [open $path wb]
    | fconfigure $f -encoding utf-8 -translation auto
    | puts $f {Hello garbled character "𡹀"!}
    | close $f


    | # reading it in:
    | set f [open $path rb]
    | fconfigure $f -encoding utf-8 -translation auto
    | read $f
    | close $f

    | This prints out the following line:
    | Hello garbled character "𡹀"!

    https://core.tcl-lang.org/tips/doc/trunk/tip/388.md https://core.tcl-lang.org/tips/doc/trunk/tip/389.md https://wiki.tcl-lang.org/page/Unicode+and+UTF-8

    TCL did not handle Unicode chars outside the BMP (which the U+21E40
    is). Tip 388 is supposed to solve this, and it is listed as

    Tcl-Version: 8.6
    Tcl-Branch: tip-388-impl

    but I suspect that does mean it is not yet available in the trunk.

    At least in my tcl 8.6.13

    set x \u21E40

    sets x to \u21E4 followed by a "0".

    Your code saves the U+21E40 char on disk as the byte sequence

    \360\241\271\200

    (I have no clue whether this is correct utf-8 for \u21E40).

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to Ralf Fassel on Wed Apr 5 13:20:48 2023
    On 4/5/2023 1:00 PM, Ralf Fassel wrote:

    At least in my tcl 8.6.13

    set x \u21E40

    sets x to \u21E4 followed by a "0".

    Your code saves the U+21E40 char on disk as the byte sequence

    \360\241\271\200

    (I have no clue whether this is correct utf-8 for \u21E40).


    Me neither :-)

    I don't have the proper set up to enter such characters manually. The
    issues you see are perhaps related to data entry. The recent posts about unicode entry into ext widgets might be relevant if you want to try,
    (posted solutions of which did not work for me by the way). If you
    copy/paste the Chinese character, you will see that the code works fine.
    I assume the original poster can enter these characters directly. So ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Apr 5 20:19:28 2023
    Am 05.04.2023 um 09:20 schrieb WJG:
    By and large I have no problems working with Tcl and unicode strings of Chinese characters. However, certain variant forms become garbled when saving. How should I configure my output stream to overcome this? The particular character is U+21E40.

    https://ctext.org/dictionary.pl?if=en&char=%F0%A1%B9%80

    Many thanks in advance for any comments.

    W.

    Dear WJG,

    thanks for your message!

    https://onlineunicodetools.com/convert-unicode-to-utf8

    Translates

    𡹀

    to

    f0 a1 b9 80

    Ralf has given (in octal ;-) ) : \360\241\271\200

    which is correct:

    % set s \360\241\271\200
    𡹀
    (bin) 2 % scan $s %c%c%c%c
    240 161 185 128
    (bin) 3 % format "%X %x %x %x" {*}[scan $s %c%c%c%c]
    F0 a1 b9 80

    So, that should work for any other program.

    Remark, that the internal representation in TCL 8.6.11 - TCL 8.7.99 is a
    set of two surrogates for any non BMP character:

    % set c 𡹀
    𡹀
    (bin) 5 % string length $c
    2

    This is a ultra-hack and may be fixed by:
    1) using TCl 9.x
    2) compiling TCL with a TCL_UNICHAR size larger than 3 (non-standard).

    You may try Androwish and friends, to get real support for this now, as
    it uses option 2 above.

    And you may write a big posting everywhere:
    - that you are confused
    - and that you want 9.0 now
    (sorry, a bit a half joke, but partly very true)

    Take care,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From WJG@21:1/5 to Harald Oehlmann on Fri Apr 7 01:08:31 2023
    On Wednesday, 5 April 2023 at 19:19:32 UTC+1, Harald Oehlmann wrote:
    Am 05.04.2023 um 09:20 schrieb WJG:
    By and large I have no problems working with Tcl and unicode strings of Chinese characters. However, certain variant forms become garbled when saving. How should I configure my output stream to overcome this? The particular character is U+21E40.

    https://ctext.org/dictionary.pl?if=en&char=%F0%A1%B9%80

    Many thanks in advance for any comments.

    W.
    Dear WJG,

    thanks for your message!

    https://onlineunicodetools.com/convert-unicode-to-utf8

    Translates

    𡹀

    to

    f0 a1 b9 80

    Ralf has given (in octal ;-) ) : \360\241\271\200

    which is correct:

    % set s \360\241\271\200
    𡹀
    (bin) 2 % scan $s %c%c%c%c
    240 161 185 128
    (bin) 3 % format "%X %x %x %x" {*}[scan $s %c%c%c%c]
    F0 a1 b9 80

    So, that should work for any other program.

    Remark, that the internal representation in TCL 8.6.11 - TCL 8.7.99 is a
    set of two surrogates for any non BMP character:

    % set c 𡹀
    𡹀
    (bin) 5 % string length $c
    2

    This is a ultra-hack and may be fixed by:
    1) using TCl 9.x
    2) compiling TCL with a TCL_UNICHAR size larger than 3 (non-standard).

    You may try Androwish and friends, to get real support for this now, as
    it uses option 2 above.

    And you may write a big posting everywhere:
    - that you are confused
    - and that you want 9.0 now
    (sorry, a bit a half joke, but partly very true)

    Take care,
    Harald

    Thanks for taking the time to answer my post. So, no active support for this range. I'm coding on Linux so I'll explore using the glib route. TBH, its not that much of a nightmare, the character I presented in basically a 'typo' that made its way into
    the main dictionaries.

    When, if ever, is Tcl 9 to be released to the distros?

    WJG

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Fri Apr 7 11:37:10 2023
    Am 07.04.2023 um 10:08 schrieb WJG:
    On Wednesday, 5 April 2023 at 19:19:32 UTC+1, Harald Oehlmann wrote:
    Am 05.04.2023 um 09:20 schrieb WJG:
    By and large I have no problems working with Tcl and unicode strings of Chinese characters. However, certain variant forms become garbled when saving. How should I configure my output stream to overcome this? The particular character is U+21E40.

    https://ctext.org/dictionary.pl?if=en&char=%F0%A1%B9%80

    Many thanks in advance for any comments.

    W.
    Dear WJG,

    thanks for your message!

    https://onlineunicodetools.com/convert-unicode-to-utf8

    Translates

    𡹀

    to

    f0 a1 b9 80

    Ralf has given (in octal ;-) ) : \360\241\271\200

    which is correct:

    % set s \360\241\271\200
    𡹀
    (bin) 2 % scan $s %c%c%c%c
    240 161 185 128
    (bin) 3 % format "%X %x %x %x" {*}[scan $s %c%c%c%c]
    F0 a1 b9 80

    So, that should work for any other program.

    Remark, that the internal representation in TCL 8.6.11 - TCL 8.7.99 is a
    set of two surrogates for any non BMP character:

    % set c 𡹀
    𡹀
    (bin) 5 % string length $c
    2

    This is a ultra-hack and may be fixed by:
    1) using TCl 9.x
    2) compiling TCL with a TCL_UNICHAR size larger than 3 (non-standard).

    You may try Androwish and friends, to get real support for this now, as
    it uses option 2 above.

    And you may write a big posting everywhere:
    - that you are confused
    - and that you want 9.0 now
    (sorry, a bit a half joke, but partly very true)

    Take care,
    Harald

    Thanks for taking the time to answer my post. So, no active support for this range. I'm coding on Linux so I'll explore using the glib route. TBH, its not that much of a nightmare, the character I presented in basically a 'typo' that made its way into
    the main dictionaries.

    When, if ever, is Tcl 9 to be released to the distros?

    WJG

    Hi WJG,

    thanks for the positng.
    If you are on 8.6.11+, there is the upper support. That works with the
    given draw-backs.
    Or you use Undroidwish for Linux, which works out of the box.
    Or you compile on your own with a non-standard compile option.

    On our last German TCL telco, it was told, that the last big rocks for
    TCL9 are solved. So, it may go quick or not...

    Take care,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)