• Bug#265163: locales: locale.alias aliases some names to unsupported loc

    From Branden Robinson@1:229/2 to All on Thu Aug 12 02:50:06 2004
    XPost: linux.debian.maint.glibc
    From: [email protected]

    This is a multi-part MIME message sent by reportbug.

    Package: locales
    Version: 2.3.2.ds1-15
    Severity: normal
    Tags: upstream

    Some of the locale aliases in /etc/locale.alias map names to unsupported locales. Namely, "eucJP" and "eucKR" aren't spelled correctly per /usr/share/i18n/SUPPORTED, and the "SJIS" codeset isn't supported at all.

    I'm attaching two files:

    * A Python script I wrote that found this problem.
    * A patch to correct the problem. I corrected all but one problem; I had
    to drop the alias for "japanese.sjis", as adding support for the SJIS
    character set to glibc is beyond my ability, and I don't even know if
    that's a desirable solution.

    Thanks for looking into this.

    -- System Information:
    Debian Release: 3.1
    APT prefers unstable
    APT policy: (500, 'unstable')
    Architecture: powerpc (ppc)
    Kernel: Linux 2.4.25-powerpc-smp
    Locale: LANG=C, LC_CTYPE=en_US.UTF-8

    Versions of packages locales depends on:
    ii debconf 1.4.30 Debian configuration management sy ii libc6 [glibc-2.3.2.ds1-15] 2.3.2.ds1-15 GNU C Library: Shared libraries an

    -- debconf information:
    * locales/default_environment_locale: None
    * locales/locales_to_be_generated: en_US ISO-8859-1, en_US.ISO-8859-15 ISO-8859-15, en_US.UTF-8 UTF-8

    #!/usr/bin/python

    import os
    import re

    RUNTIME_DEBUG = True

    # Build a dictionary of canonical locales according to the GNU C library. The # keys in this dictionary are the locale names, and the values are the character
    # sets used by each locale name.

    glibc_locales_canonical = { }
    glibc_locale_file = open(os.path.join("/", "usr", "share", "i18n", "SUPPORTED"))

    for line in glibc_locale_file.readlines():
    (left_side, right_side) = re.split(r'\s', line, 1)
    glibc_locales_canonical[(left_side.strip())] = right_side.strip()

    glibc_locale_file.close()

    if RUNTIME_DEBUG:
    print "Canonical glibc locales: %s" % (glibc_locales_canonical.keys(),)

    glibc_locales_aliased = { }
    glibc_alias_file = open(os.path.join("/", "etc", "locale.alias"))

    for line in glibc_alias_file.readlines():
    # Ignore blank lines and lines beginning with a comment character.
    # beginning with "XCOMM".
    if re.match(r'$', line) \
    or re.match(r'#', line):
    continue
    (left_side, right_side) = re.split(r'\s', line, 1)
    glibc_locales_aliased[(left_side.strip())] = right_side.strip()
    # glibc is a little weird; it aliases names to locale specifications
    # *including* the codeset, whereas it omits the codeset from the officially
    # supported list except when necessary for disambiguation purposes.
    # Consequently, if we don't find the alias's target in the canonical list,
    # we have to fall back to seeing if it is in the canonical list using the
    # same codeset that is explicitly stated.
    if right_side.strip() not in glibc_locales_canonical.keys():
    # Try harder to find it.
    goal_locale = right_side.strip()
    found = False
    for locale in glibc_locales_canonical.keys():
    if not re.match(r'\.', locale):
    locale_with_codeset = '.'.join([ locale,
    glibc_locales_canonical[locale] ])
    if goal_locale == locale_with_codeset:
    found = True
    break
    if not found:
    print "Warning: glibc bug: glibc locale %s is aliased to" \
    " non-canonical glibc locale %s" \
    % (left_side.strip(), right_side.strip())

    glibc_alias_file.close()

    if RUNTIME_DEBUG:
    print "Aliased glibc locales: %s" % (glibc_locales_aliased.keys(),)

    # vim:set ai et sts=4 sw=4 tw=80:

    --- /etc/locale.alias.dpkg-dist 2004-08-11 19:15:44.000000000 -0500
    +++ /etc/locale.alias 2004-08-11 19:17:57.000000000 -0500
    @@ -49,14 +49,13 @@
    hungarian hu_HU.ISO-8859-2
    icelandic is_IS.ISO-8859-1
    italian it_IT.ISO-8859-1
    -japanese ja_JP.eucJP
    -japanese.euc ja_JP.eucJP
    -ja_JP ja_JP.eucJP
    -ja_JP.ujis ja_JP.eucJP
    -japanese.sjis ja_JP.SJIS
    -korean ko_KR.eucKR
    -korean.euc ko_KR.eucKR
    -ko_KR ko_KR.eucKR
    +japanese ja_JP.EUC-JP
    +japanese.euc ja_JP.EUC-JP
    +ja_JP ja_JP.EUC-JP
    +ja_JP.ujis ja_JP.EUC-JP
    +korean ko_KR.EUC-KR
    +korean.euc ko_KR.EUC-KR
    +ko_KR ko_KR.EUC-KR
    lithuanian lt_LT.ISO-8859-13
    norwegian no_NO.ISO-8859-1
    nynorsk nn_NO.ISO-8859-1

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From GOTO Masanori@1:229/2 to Branden Robinson on Thu Aug 12 05:10:08 2004
    XPost: linux.debian.maint.glibc
    From: [email protected]

    At Wed, 11 Aug 2004 19:22:35 -0500,
    Branden Robinson wrote:
    Package: locales
    Version: 2.3.2.ds1-15
    Severity: normal
    Tags: upstream

    Some of the locale aliases in /etc/locale.alias map names to unsupported locales. Namely, "eucJP" and "eucKR" aren't spelled correctly per /usr/share/i18n/SUPPORTED, and the "SJIS" codeset isn't supported at all.

    There're various points for this issue:

    (1) /etc/locale.alias is obsolete. Upstream said they did not have
    any plan to change it unfortunatelly. I agreed their opinion.
    Don't use the obsolete /etc/locale.alias file for the recent
    system.

    (2) LSB defines the standard locale name. It's provided as backward
    compatibility for old applications.

    (3) At least eucJP is correct and it's still valid. We defined it
    followed by Japanese vendors group. EUC-JP and eucJP should be
    available. eucJP is incompatible with LSB; but LSB is very new
    for us. eucJP has been used over five years on linux, and it was
    available over ten years on other unix. At that time LSB and
    openi18n were not existed.

    (4) eucJP and eucKR can be handled in the glibc locale internal,
    because gconv-modules recognized it's alias name.

    (5) Note that some apt related tools (which is written by Python) got
    this problem with eucJP vs EUC-JP. It has been corrected.

    I'm attaching two files:

    * A Python script I wrote that found this problem.
    * A patch to correct the problem. I corrected all but one problem; I had
    to drop the alias for "japanese.sjis", as adding support for the SJIS
    character set to glibc is beyond my ability, and I don't even know if
    that's a desirable solution.

    It's sure that dropping japanese.sjis is unacceptable choise.

    So, as above reasons, at least I don't accept this patch. If you have
    no objection, I'll close it.

    Regards,
    -- gotom


    --
    To UNSUBSCRIBE, email to [email protected]
    with a subject of "unsubscribe". Trouble? Contact [email protected]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)