So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
On Sat, 13 Feb 2021 11:09:57 -0700, Grant Taylor wrote:
On 2/13/21 10:49 AM, The Natural Philosopher wrote:
HTML is not there to specify odd characters - it can but its job is to >>> format text.
All of the HTML codes for special characters tends to disagree with you.
©
€
™
...
Do a web search for "html special characters" and you will find long
lists.
I don't know what version of HTML these were introduced. But I do know
that many of the basic ones have been there for at least 20 years (HTML
4?).
... and are still there in HTML 5
"TimS" <[email protected]> wrote
| > UTF-8 allows ANSI character sets to still be used. But it also
| > provides a way to fully support multi-byte characters only
| > where necessary. It's the one solution to support all languages
| > without changing the default of 1 character to 1 byte.
|
| It's only a default for ASCII, and the characters that ASCII supports. And | when you say it allows ANSI character sets to be used, I take it you mean the
| characters that different ANSI pages supported, which under UTF-8 will
most
| likely be 2-byte chars, rather than 1-byte but 8-bit values.
|
Most ANSI character sets are also 1 byte to 1 character.
It's only the DBCS languages that can't fit that model.
So first we had ASCII. Then we had ANSI with codepages,
and most languages could be fully represented in HTML
using META content type. **All of that is 1 byte to 1
character.** Only the DBCS languages were an exception.
And they used a system similar to UTF-8.
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Browsers properly
display curly quotes, but I actually only have one unicode
font on my system, which is arial uncode MS, weighing in at
24 MB. Nothing else will render most UTF-8 characters. For example,
the RichEdit window in Windows has supported UTF-8 for
some time. And I can use the ability in my own software.
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <[email protected]> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode internally for the sake of sanity (I was for a while internationalisation specialist (among other hats) on the Yahoo! front page team). We had loads
of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - they're almost but not quite the same.
but mostly it's about semantic markup honest.
... plus all the (X)HTML Symbols and characters - á &
£ ... which AFAIK can't be rendered with CSS.
On 2/13/21 11:11 AM, Ahem A Rivet's Shot wrote:
Nope, that's CSS's job. HTML's job is to add semantic markup - OK
they dropped the ball with <b>, <i>, <br>, <blink> as well as some
pre-css font and colour properties and ..., but mostly it's about
semantic markup honest.
That's the /current/ interpretation. 20 years ago, there was a
different interpretation.
Hello folks,
I've recently set up a Pi 2B and pretend to play around with some stuff on it.
I was trying to run Mystic, but it seems that the LXTerm is not very much friendly to ANSI character codes. Is there a way to tweak it?
On 13 Feb 2021 at 21:05:47 GMT, Ahem A Rivet's Shot <[email protected]> wrote:
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <[email protected]> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode
internally for the sake of sanity (I was for a while internationalisation
specialist (among other hats) on the Yahoo! front page team). We had loads >> of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - >> they're almost but not quite the same.
I convert everything to UTF-8. Windows tends to lie about which code-page it's using, anyway.
Not that it really matters. It's pretty much all ASCII.
On Sat, 13 Feb 2021 17:49:28 +0000
The Natural Philosopher <[email protected]d> wrote:
HTML is not there to specify odd characters - it can but its job is to
format text.
Nope, that's CSS's job. HTML's job is to add semantic markup - OK
they dropped the ball with <b>, <i>, <br>, <blink> as well as some pre-css font and colour properties and ..., but mostly it's about semantic markup honest.
Ahem A Rivet's Shot wrote:
but mostly it's about semantic markup honest.
Yes and no. HTML is markup and characters are content. So by definition characters are not HTML. But still HTML defined how to encode those not
part of 7-bit ASCII.
On 2/13/21 10:49 AM, The Natural Philosopher wrote:
HTML is not there to specify odd characters - it can but its job is
to format text.
All of the HTML codes for special characters tends to disagree with you.
©
€
™
...
Do a web search for "html special characters" and you will find long lists.
I don't know what version of HTML these were introduced. But I do know
that many of the basic ones have been there for at least 20 years (HTML
4?).
On 13 Feb 2021 at 21:05:47 GMT, Ahem A Rivet's Shot <[email protected]> wrote:
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <[email protected]> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode
internally for the sake of sanity (I was for a while internationalisation
specialist (among other hats) on the Yahoo! front page team). We had loads >> of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - >> they're almost but not quite the same.
I convert everything to UTF-8. Windows tends to lie about which code-page it's
using, anyway.
the question of which font doesn't enter
into character representation.
On 13/02/2021 18:11, Ahem A Rivet's Shot wrote:
Nope, that's CSS's job. HTML's job is to add semantic markup -
OK they dropped the ball with <b>, <i>, <br>, <blink> as well as some pre-css font and colour properties and ..., but mostly it's about
semantic markup honest.
CSS is part of HTML
On 13/02/2021 22:09, Axel Berger wrote:
Ahem A Rivet's Shot wrote:By saying 'content-type: UTF8' or whatever the exact magic spell is
but mostly it's about semantic markup honest.
Yes and no. HTML is markup and characters are content. So by definition
characters are not HTML. But still HTML defined how to encode those not
part of 7-bit ASCII.
On 13/02/2021 21:48, TimS wrote:
the question of which font doesn't enter
into character representation.
It does if the font in use has no representation of the glyph you are
trying to display
You wont get far trying to display Gujarati in Arial Narrow...
On 13 Feb 2021 at 21:05:47 GMT, Ahem A Rivet's Shot <[email protected]> wrote:
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <[email protected]> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode internally for the sake of sanity (I was for a while internationalisation specialist (among other hats) on the Yahoo! front page team). We had loads of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - they're almost but not quite the same.
I convert everything to UTF-8. Windows tends to lie about which code-page it's
using, anyway.
It does if the font in use has no representation of the glyph you are
trying to display
You wont get far trying to display Gujarati in Arial Narrow...
On 14 Feb 2021 at 03:36:19 GMT, The Natural Philosopher
<[email protected]d>
wrote:
On 13/02/2021 22:09, Axel Berger wrote:
Ahem A Rivet's Shot wrote:By saying 'content-type: UTF8' or whatever the exact magic spell is
but mostly it's about semantic markup honest.
Yes and no. HTML is markup and characters are content. So by
definition characters are not HTML. But still HTML defined how to
encode those not part of 7-bit ASCII.
Just start your html page with:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
End of.
"The Natural Philosopher" <[email protected]d> wrote
| > Not that it really matters. It's pretty much all ASCII.
| >
| >
| Schrödingers cat would disagree - or ½ of him would.
|
:) I always wonder how people end up using these characters.
There are ways to do it. I can copy the character from existing
text. On Windows I think there's Charmap, though I've never
used it. Schrodinger will just have to get by without his umlaut.
Just as "naive" has survived without one.
Then there's the matter of the mechanical entry system. My
keyboard only has ASCII and a few extras.
So it's a good solution for webpages, but once you get into
entering, editing and storing multi-lingual text it gets very
complicated. Only for those of us who speak English is it
reasonable to say that UTF-8 makes everything easy.
On Sun, 14 Feb 2021 08:32:36 -0500
"Mayayana" <[email protected]> wrote:
So it's a good solution for webpages, but once you get into
entering, editing and storing multi-lingual text it gets very
complicated. Only for those of us who speak English is it
reasonable to say that UTF-8 makes everything easy.
Not so! Unicode is the enabler for anyone who needs to handle
multiple scripts and languages, sure if you just want CJK then you could
use SHIFT-JIS but if you want to be able to hold text and not worry about what script or language it belongs to and mix scripts and languages freely then Unicode is the only solution.
But in System Prefs -> Keyboard -> Keyboard-tab I have ticked
"Show Keyboard Viewer in menu bar".
"The Natural Philosopher" <[email protected]d> wrote
| CSS is part of HTML
|
It's part of web design but it's an entirely different system
and syntax. Though I suppose that's splitting hairs.
So all those ANSI pages need to go in the bin, really speaking.
"Ahem A Rivet's Shot" <[email protected]> wrote
| > It's part of web design but it's an entirely different system
| > and syntax. Though I suppose that's splitting hairs.
|
| Not really, there's an important separation CSS applies to any XML
| or SGML not just HTML.
|
You snipped my example. The "cascading" part applies
there. First is the CSS file. Then that's overridden by
CSS in the STYLE tag of the page. Then that can be
overridden by a STYLE attribute in the HTML tag. If
you want pretty fonts for your XML that's up to you,
but CSS is still deeply entagled with HTML. To not
"The Natural Philosopher" <[email protected]d> wrote
| > Not that it really matters. It's pretty much all ASCII.
| >
| >
| Schrödingers cat would disagree - or ½ of him would.
|
:) I always wonder how people end up using these characters.
There are ways to do it. I can copy the character from existing
text. On Windows I think there's Charmap, though I've never
used it. Schrodinger will just have to get by without his umlaut.
Just as "naive" has survived without one.
Then there's the matter of the mechanical entry system. My
keyboard only has ASCII and a few extras.
Where this really helps is with things like Chinese. But it only
really helps them. For English speakers, we deal with pretty much all
ASCII. And that's not the 1/2 of it. As you noted, if you want
to write unicode you also need a unicode font. Browsers make
it look simple, but for general text files it's not so simple. For
example, I like to use Verdana for most text. But the font
is not unicode. Windows will display UTF-8 as ANSI.
If I visit xinhuanet.com I see Chinese characters. (Even though
it's all Greek to me.) If I check the source code I see Chinese. If
I download that and open it in my code editor as UTF-8 with
Verdana font, I see some of the languages. It looks like I'm
getting Russian and Arabic, for example. But the Chinese is all
little boxes. If I open it in Notepad, since it's plain text with no
file header, it shows as English ANSI with lots of little boxes.
So it's a good solution for webpages, but once you get into
entering, editing and storing multi-lingual text it gets very
complicated. Only for those of us who speak English is it
reasonable to say that UTF-8 makes everything easy. It does,
but only because it's usually exactly the same byte string as
ASCII. In fact, if I happen to come across
UTF-8 text or HTML code I'll generally convert it to ASCII/ANSI
for convenience. It's too much trouble trying to access it across
different programs and displays at UTF-8. On Linux, where that's
standard, it's fine. But we have to remember that this is
representational file encoding. UTF-8 by itself is no miracle.
Microsoft are one of the sites that have used UTF-8 for years.
It's all English on their English pages, but they spec it as
UTF-8, use curly quotes and UTF-8 space characters. Neither
is necessary and it complicates things. Both of these will work
with an English codepage. The first should work anywhere:
“curly &#nbsp; quotes”
“curly   quotes”
And just to keep TNP happy here's some Gujarati: શ ણ ઊ ૐ . Hope it's notLucky I have gujarati capable fonts in use
rude.
"The Natural Philosopher" <[email protected]d> wrote
| Lucky I have gujarati capable fonts in use
|
|
| ᚠᚲ ᛗᛖ
|
Indeed. I see "as", "as" squared, a-, a-.
But I'm sure that's a very funny joke in India.
"TimS" <[email protected]> wrote
| You obviously need to get a new Usenet client.
|
No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <[email protected]> wrote
You obviously need to get a new Usenet client.No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <[email protected]> wrote
| You obviously need to get a new Usenet client.
|
No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <[email protected]> wrote
| You obviously need to get a new Usenet client.
|
No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <[email protected]> wrote
| > No. I can only read English. I only write English.
| > It's of no value to see characters in languages I
| > can't read, even if I have the font. The nice thing
| > about not displaying in UTF-8 is that I don't have
| > to see emojis. I can just write a cranky not back
| > to people saying that I only see boxes. That usually
| > cures them of emoji mania. :)
|
| You're obviously making life too easy for yourself. Why not just junk all this
| useful software nonsense, and read the bits directly off the disk. All you | need is a bar magnet, a magnifying glass, and some fine iron filings.
| Simples!
|
:) This seems to really get your goat. If you'll recall,
all I said was that UTF-8 was a good choice because
for English-speaking people and most webpages it was
an invisible transition. That might not be politically
correct, but it's true.
If you're on Linux and text files default to UTF-8 then
that's handy. You'll never need to know that the encoding
is not ASCII/ANSI. Since all my text files and HTML files are
essentially ASCII, I convert any UTF-8 I get to that.
For me UTF-8 is only corrupted text data.
For many people, UTF-8 is a great solution. That's fine.
But if you're going to send me funky characters for no reason,
in English, in a text-based medium, I see no reason to figure
out how to decipher it... And imagine my dismay at going to the
trouble only to find that someone has sent me 4 crying faces,
3 piles of shit, and an umbrella... or is that a soccer ball?
Or that they're trying to show off by sending some ditty in
Turkish or Russian... I still don't know what it means. I can't
read Turkish and Russian.
And what the heck does 4 crying faces and 3 piles of
shit and an umbrella mean? The sender is having a tantrum?
They've eaten too many prunes? They hate shitting? Or maybe
it's an inside joke. Maybe that's Beyonce's famous signature?
Sort of a "proud to be cranky" gimmick? Maybe it's Taylor
Swift's official breakup note? Maybe the sender is signalling
their fondness for some foul tempered rock star?
Who knows? It's
hardly an articulate expression. I pretty much ignore
emojis, anyway, for that reason. I usually don't know
what they mean. I just figured out that what I thought
was a corncob is probably "anjali" -- praying hands. So...
what?... a hippie is writing to me and they've developed
that irritating behavioral tic of bowing to express false
humility? ... Or maybe it really is a corncob. Beats me.
I need UTF-8 so that I can see such crap? I don't think so.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 714 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 140:36:15 |
| Calls: | 12,087 |
| Files: | 14,998 |
| Messages: | 6,517,424 |