Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

Dealing with encodings

From Luc@21:1/5 to All on Fri Feb 24 15:24:09 2023

I have a basic text editor that well, edits text files.

I've been using it for a long time without ever giving
a thought about encodings. I just open, edit and save.
I never knew or cared about what encodings were involved.

I want to change that.

I know how to tell Tcl to write with a certain encoding.
But I never implemented that and I've been thinking that
I should probably keep the existing encoding in most cases,
and for that I have to be able to tell what encoding is
there already.

I believe Tcl cannot do that. I've been researching and
it seems that we need external software to do that, namely
'file' and 'enca' neither of which is super reliable.

What experience do you have with that? Can you share any
suggestions or recommendations?

--
Luc

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich@21:1/5 to Luc on Sat Feb 25 06:10:20 2023

Luc <[email protected]d> wrote:

I know how to tell Tcl to write with a certain encoding. But I never implemented that and I've been thinking that I should probably keep
the existing encoding in most cases, and for that I have to be able
to tell what encoding is there already.

I believe Tcl cannot do that. I've been researching and it seems
that we need external software to do that, namely 'file' and 'enca'
neither of which is super reliable.

Actually, absent side-channel information, it is impossible to tell
with 100% certainty what 'encoding' a given file has been encoded with.

The best you can do is verify that a given file does not contain any
illegal sequences for the expected encoding. These kinds of
hieuristics will get you 95% there, but it will always be possible for something to slip through.

What experience do you have with that? Can you share any suggestions
or recommendations?

For reading, if you assume UTF-8, you'll be right more often than wrong
for anything modern. The older the "text file" you plan to edit, the
greater probability for UTF-8 to be an incorrect choice. And there
will always end up being a few where you just have to make a guess and
see if it looks like it worked.

For writing, just create everything as UTF-8 unless you have a *very*
good reason to do otherwise.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	51:16:19
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,208