Luc <
[email protected]d> wrote:
I know how to tell Tcl to write with a certain encoding. But I never implemented that and I've been thinking that I should probably keep
the existing encoding in most cases, and for that I have to be able
to tell what encoding is there already.
I believe Tcl cannot do that. I've been researching and it seems
that we need external software to do that, namely 'file' and 'enca'
neither of which is super reliable.
Actually, absent side-channel information, it is impossible to tell
with 100% certainty what 'encoding' a given file has been encoded with.
The best you can do is verify that a given file does not contain any
illegal sequences for the expected encoding. These kinds of
hieuristics will get you 95% there, but it will always be possible for something to slip through.
What experience do you have with that? Can you share any suggestions
or recommendations?
For reading, if you assume UTF-8, you'll be right more often than wrong
for anything modern. The older the "text file" you plan to edit, the
greater probability for UTF-8 to be an incorrect choice. And there
will always end up being a few where you just have to make a guess and
see if it looks like it worked.
For writing, just create everything as UTF-8 unless you have a *very*
good reason to do otherwise.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)