mylongstr += chr (thisCP) + " "mylongstr = ""
for thisCP in range (157, 169):
ン ゙ ゚ ᅠ ᄀ ᄁ ᆪ ᄂ ᆬ ᆭ ᄃ ᄄprint mylongstr
mylongstr += chr (thisCP) + " "mylongstr = ""
for thisCP in range (158, 169):
ž Ÿ ¡ ¢ £ ¤ ¥ ¦ § ¨print mylongstr
mylongstr += chr (thisCP) + " "mylongstr = ""
for thisCP in range (157, 169):
ン ゙ ゚ ᅠ ᄀ ᄁ ᆪ ᄂ ᆬ ᆭ ᄃ ᄄprint mylongstr
1. Can anybody explain the behaviour in IDLE (Python version 2.7.10)
reported below? (It seems that the way it renders a given sequence of bytes >depends on the sequence.)
2. Does the IDLE in Python 3.x behave the same way?
On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
2. Does the IDLE in Python 3.x behave the same way?
fwiw
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more information.
str = ""
for c in range(157, 169):
str += chr(c) + ""
print(str)
¡¢£¤¥¦§¨
str = ""
for c in range(140, 169):
str += chr(c) + " "
print(str)
¡ ¢ £ ¤ ¥ ¦ § ¨
I don't know how this will appear since Pan is showing the icon for a character not in its set. However, even with more undefined characters
the printable one do not change. I get the same output running Python3
from the terminal so it's not an IDLE thing.
I have four questions.
1. Can anybody explain the behaviour in IDLE (Python version 2.7.10) reported below? (It seems that the way it renders a given sequence of bytes depends on the sequence.)
2. Does the IDLE in Python 3.x behave the same way?
3. If it does, is this as it should behave?
4. If it is, then why is it as it should behave? ==============================
mylongstr += chr (thisCP) + " "mylongstr = ""
for thisCP in range (157, 169):
ン ゙ ゚ ᅠ ᄀ ᄁ ᆪ ᄂ ᆬ ᆭ ᄃ ᄄprint mylongstr
mylongstr += chr (thisCP) + " "mylongstr = ""
for thisCP in range (158, 169):
ž Ÿ ¡ ¢ £ ¤ ¥ ¦ § ¨print mylongstr
mylongstr += chr (thisCP) + " "mylongstr = ""
for thisCP in range (157, 169):
ン ゙ ゚ ᅠ ᄀ ᄁ ᆪ ᄂ ᆬ ᆭ ᄃ ᄄ ==============================print mylongstr
Stephen Tucker.
On 1/17/2023 8:46 PM, rbowman wrote:
On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
2. Does the IDLE in Python 3.x behave the same way?
fwiw
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more information. str = ""
for c in range(140, 169):
str += chr(c) + " "
print(str)
� � � � � � � � � � � � � � � � � � � � � � � � �
� � �
I don't know how this will appear since Pan is showing the icon for a character not in its set. However, even with more undefined characters
the printable one do not change. I get the same output running Python3
from the terminal so it's not an IDLE thing.
I'm not sure what explanation is being asked for here. Let's take Python3, so we can be sure that the strings are in unicode. The font being used by the console isn't mentioned, but there's no reason it should have glyphs for any random unicode character.
mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "mylongstr = ""
for thisCP in range (1, 256):
1, 2, 3, 4, 5, 6, 7, 8, 9,print mylongstr
On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:
On 1/17/2023 8:46 PM, rbowman wrote:
On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
2. Does the IDLE in Python 3.x behave the same way?
fwiw
information.Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more
str = ""
for c in range(140, 169):
str += chr(c) + " "
print(str)
Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ ¡ ¢ £ ¤ ¥
¦ § ¨
I don't know how this will appear since Pan is showing the icon for a character not in its set. However, even with more undefined characters the printable one do not change. I get the same output running Python3 from the terminal so it's not an IDLE thing.
I'm not sure what explanation is being asked for here. Let's takePython3,
so we can be sure that the strings are in unicode. The font being usedby
the console isn't mentioned, but there's no reason it should have glyphsfor
any random unicode character.
Also note that the characters between 128 (U+0080) and 159 (U+009F)
inclusive aren't printable characters. They are control characters.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
--
https://mail.python.org/mailman/listinfo/python-list
Thanks for these responses.
I was encouraged to read that I'm not the only one to find this all confusing.
I have investigated a little further.
1. I produced the following IDLE log:
mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "mylongstr = ""
for thisCP in range (1, 256):
1, 2, 3, 4, 5, 6, 7, 8, 9,print mylongstr
10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
, 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
126, 127, タ 128, チ 129, ツ 130, テ 131, ト 132, ナ 133, ニ 134, ヌ 135, ネ 136, ノ
137, ハ 138, ヒ 139, フ 140, ヘ 141, ホ 142, マ 143, ミ 144, ム 145, メ 146, モ 147,
ヤ 148, ユ 149, ヨ 150, ラ 151, リ 152, ル 153, レ 154, ロ 155, ワ 156, ン 157, ゙
158, ゚ 159, ᅠ 160, ᄀ 161, ᄁ 162, ᆪ 163, ᄂ 164, ᆬ 165, ᆭ 166, ᄃ 167, ᄄ 168,
ᄅ 169, ᆰ 170, ᆱ 171, ᆲ 172, ᆳ 173, ᆴ 174, ᆵ 175, ᄚ 176, ᄆ 177, ᄇ 178, ᄈ
179, ᄡ 180, ᄉ 181, ᄊ 182, ᄋ 183, ᄌ 184, ᄍ 185, ᄎ 186, ᄏ 187, ᄐ 188, ᄑ 189,
ᄒ 190, 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
ý 253, þ 254, ÿ 255,
2. I copied and pasted the IDLE log into a text file and ran a program on
it that told me about every byte in the log.
3. I discovered the following:
Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;
Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
characters whose codepoints were FF00 hex more than the byte values (hence the strange glyphs);
Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
characters - without any offset being added to their codepoints in the meantime!
I thought you might just be interested in this - there does seem to be some method in IDLE's mind, at least.
Stephen Tucker.
On Wed, Jan 18, 2023 at 9:41 AM Peter J. Holzer <[email protected]> wrote:
On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:
On 1/17/2023 8:46 PM, rbowman wrote:information.
On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
2. Does the IDLE in Python 3.x behave the same way?
fwiw
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more
Python3,str = ""
for c in range(140, 169):
str += chr(c) + " "
print(str)
Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ ¡ ¢ £ ¤ ¥
¦ § ¨
I don't know how this will appear since Pan is showing the icon for a
character not in its set. However, even with more undefined characters >>>> the printable one do not change. I get the same output running Python3 >>>> from the terminal so it's not an IDLE thing.
I'm not sure what explanation is being asked for here. Let's take
so we can be sure that the strings are in unicode. The font being usedby
the console isn't mentioned, but there's no reason it should have glyphsfor
any random unicode character.
Also note that the characters between 128 (U+0080) and 159 (U+009F)
inclusive aren't printable characters. They are control characters.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
--
https://mail.python.org/mailman/listinfo/python-list
On 1/18/2023 5:43 AM, Stephen Tucker wrote:
Thanks for these responses.
I was encouraged to read that I'm not the only one to find this all confusing.
I have investigated a little further.
1. I produced the following IDLE log:
mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "mylongstr = ""
for thisCP in range (1, 256):
1, 2, 3, 4, 5, 6, 7, 8, 9,print mylongstr
10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43, , 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8 56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68, E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q 81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93, ^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i 105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115, t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~ 126, 127, タ 128, チ 129, ツ 130, テ 131, ト 132, ナ 133, ニ 134, ヌ 135, ネ 136, ノ
137, ハ 138, ヒ 139, フ 140, ヘ 141, ホ 142, マ 143, ミ 144, ム 145, メ 146, モ 147,
ヤ 148, ユ 149, ヨ 150, ラ 151, リ 152, ル 153, レ 154, ロ 155, ワ 156, ン 157, ゙
158, ゚ 159, ᅠ 160, ᄀ 161, ᄁ 162, ᆪ 163, ᄂ 164, ᆬ 165, ᆭ 166, ᄃ 167, ᄄ 168,
ᄅ 169, ᆰ 170, ᆱ 171, ᆲ 172, ᆳ 173, ᆴ 174, ᆵ 175, ᄚ 176, ᄆ 177, ᄇ 178, ᄈ
179, ᄡ 180, ᄉ 181, ᄊ 182, ᄋ 183, ᄌ 184, ᄍ 185, ᄎ 186, ᄏ 187, ᄐ 188, ᄑ 189,
ᄒ 190, 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
ý 253, þ 254, ÿ 255,
2. I copied and pasted the IDLE log into a text file and ran a program on it that told me about every byte in the log.
3. I discovered the following:
Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;
Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded characters whose codepoints were FF00 hex more than the byte values (hence the strange glyphs);
Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded characters - without any offset being added to their codepoints in the meantime!
I thought you might just be interested in this - there does seem to be some method in IDLE's mind, at least.
This has nothing to do with IDLE. The UTF-8 encoding of those code points uses two bytes instead of one. See
1. Can anybody explain the behaviour in IDLE (Python version 2.7.10)
reported below? (It seems that the way it renders a given sequence of bytes depends on the sequence.)
2. Does the IDLE in Python 3.x behave the same way?
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 01:28:18 |
| Calls: | 12,098 |
| Calls today: | 6 |
| Files: | 15,003 |
| Messages: | 6,517,863 |