• Re: UTF-8 and latin1

    From Stefan Ram@21:1/5 to Stefan Ram on Tue Oct 25 10:16:58 2022
    [email protected] (Stefan Ram) writes:
    You can let Python guess the encoding of a file.
    def encoding_of( name ):
    path = pathlib.Path( name )
    for encoding in( "utf_8", "cp1252", "latin_1" ):
    try:
    with path.open( encoding=encoding, errors="strict" )as file:

    I also read a book which claimed that the tkinter.Text
    widget would accept bytes and guess whether these are
    encoded in UTF-8 or "ISO 8859-1" and decode them
    accordingly. However, today I found that here it does
    accept bytes but it always guesses "ISO 8859-1".

    main.py

    import tkinter

    text = tkinter.Text()
    text.insert( tkinter.END, "AÄäÖöÜüß".encode( encoding='ISO 8859-1' )) text.insert( tkinter.END, "AÄäÖöÜüß".encode( encoding='UTF-8' )) text.pack()
    print( text.get( "1.0", "end" ))

    output

    AÄäÖöÜüßAÄäÖöÜüß

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry Scott@21:1/5 to All on Tue Oct 25 18:59:06 2022
    On 25 Oct 2022, at 11:16, Stefan Ram <[email protected]> wrote:

    [email protected] (Stefan Ram) writes:
    You can let Python guess the encoding of a file.
    def encoding_of( name ):
    path = pathlib.Path( name )
    for encoding in( "utf_8", "cp1252", "latin_1" ):
    try:
    with path.open( encoding=encoding, errors="strict" )as file:

    I also read a book which claimed that the tkinter.Text
    widget would accept bytes and guess whether these are
    encoded in UTF-8 or "ISO 8859-1" and decode them
    accordingly. However, today I found that here it does
    accept bytes but it always guesses "ISO 8859-1".

    The best you can do is assume that if the text cannot decode as utf-8 it may be 8859-1.

    Barry


    main.py

    import tkinter

    text = tkinter.Text()
    text.insert( tkinter.END, "AÄäÖöÜüß".encode( encoding='ISO 8859-1' )) text.insert( tkinter.END, "AÄäÖöÜüß".encode( encoding='UTF-8' )) text.pack()
    print( text.get( "1.0", "end" ))

    output

    AÄäÖöÜüßAÄäÖöÜüß


    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to Barry Scott on Wed Oct 26 08:05:09 2022
    On Wed, 26 Oct 2022 at 05:09, Barry Scott <[email protected]> wrote:



    On 25 Oct 2022, at 11:16, Stefan Ram <[email protected]> wrote:

    [email protected] (Stefan Ram) writes:
    You can let Python guess the encoding of a file.
    def encoding_of( name ):
    path = pathlib.Path( name )
    for encoding in( "utf_8", "cp1252", "latin_1" ):
    try:
    with path.open( encoding=encoding, errors="strict" )as file:

    I also read a book which claimed that the tkinter.Text
    widget would accept bytes and guess whether these are
    encoded in UTF-8 or "ISO 8859-1" and decode them
    accordingly. However, today I found that here it does
    accept bytes but it always guesses "ISO 8859-1".

    The best you can do is assume that if the text cannot decode as utf-8 it may be 8859-1.


    Except when it's Windows-1252.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Thu Oct 27 08:26:02 2022
    Latin-1 - Windows-1252

    Today in good software, latin-1 is an alias for Windows-1252.

    Latin-1 was badly design and is unusable.
    In "unicode" latin-1 deliberately does not exist.

    That’s why Monsieur Adrian MUŸ can have a working mailing address
    and can order a train ticket from his desktop.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)