• Bug#253170: UTF-8 support in CenterICQ

    From Paul Hampson@1:229/2 to All on Fri Aug 20 20:30:11 2004
    From: [email protected]

    I've been looking into this, and actually UTF-8 support
    _does_ work in the current CenterICQ build.

    However, you have to put your local chartype into the
    "local chartype" box in CenterICQ's config options.

    This is enough to make UTF-8 work over the UTF-8-native
    transports MSN and Yahoo!. However, ICQ (and I presume AIM)
    tend to send their data in whatever format seems best, and
    it appears to be up to the client. In that case, you need to
    put your partner's codepage into the remote chartype box,
    and enable codepage conversion for that transport.

    If you put nothing in the chartype box, they default to
    ISO-8859-1. This produces an interesting effect where
    UTF-8 encoded data gets UTF-8 encoded _again_, byte by
    byte, producing very strange output indeed.

    Ideally, the default for the local charset would be governed
    by LC_CTYPE, but that's not (yet) been implemented.

    --
    Paul "TBBle" Hampson, [email protected]
    7th year CompSci/Asian Studies student, ANU

    Shorter .sig for a more eco-friendly paperless office.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.5 (GNU/Linux)

    iD8DBQFBJjxBexDuohKLFuARAgWYAKCG2TZw+x12mzH1jNAqO2OXeS7B4wCgj6zR SDBVkbCDJqO/uVnRK5x59Bw=
    =qrOL
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Miernik@1:229/2 to Paul Hampson on Fri Aug 20 21:00:25 2004
    From: [email protected]

    On Sat, Aug 21, 2004 at 04:00:33AM +1000, Paul Hampson wrote:
    I've been looking into this, and actually UTF-8 support
    _does_ work in the current CenterICQ build.

    However, you have to put your local chartype into the
    "local chartype" box in CenterICQ's config options.

    I just set like this:

    ├─ Codepages conversion
    │ ├─ Switch to language preset : None
    │ ├─ Remote charset :
    │ ├─ Local charset : utf-8
    │ └─ For protocols :


    And my Jabber recipient gets garbage if I type UTF-8 Central European
    Polish accented letters: ążłóęćńł

    How do you set it?

    --
    Miernik _________________________ xmpp:[email protected] ___________________/__ tel: +48888299997 __/ mailto:[email protected] http://www.miernik.ctnet.pl/

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.5 (GNU/Linux)

    iD8DBQFBJkYhXn9ZWavCktERArelAJ9nByCDQhwTF3vdOta3hy1cVUK8hQCghpD0 qFXir+DLompP043D1P9YaVQ=
    =Al02
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Paul Hampson@1:229/2 to Miernik on Sat Aug 21 06:40:05 2004
    From: [email protected]

    On Fri, Aug 20, 2004 at 08:42:41PM +0200, Miernik wrote:
    On Sat, Aug 21, 2004 at 04:00:33AM +1000, Paul Hampson wrote:
    I've been looking into this, and actually UTF-8 support
    _does_ work in the current CenterICQ build.

    However, you have to put your local chartype into the
    "local chartype" box in CenterICQ's config options.

    I just set like this:

    ├─ Codepages conversion
    │ ├─ Switch to language preset : None
    │ ├─ Remote charset :
    │ ├─ Local charset : utf-8
    │ └─ For protocols :


    And my Jabber recipient gets garbage if I type UTF-8 Central European
    Polish accented letters: ążłóęćńł

    That's _exactly_ like I have it, and it works for MSN. I can send
    Japanese back and forth without problems... And looking at the
    Jabber code, it handles the data the same way,
    rusconv("ku", text);
    which is a null-operation with the charset config above.
    ("ku" mode only uses local charset, it's local->utf conversion)

    I don't use Jabber myself, but I'd suggest you use ethereal to
    packet-capture an outgoing message with just the characters,
    and also try
    echo <characters> | hd
    so you can compare what's actually coming out with what it
    should be. That'll give a good clue as to what's actually
    happening...

    My _suspicion_ is that the jabber library is treating the text
    it is being given as not being UTF-8 and UTF-8 encoding it
    _again_, where the MSN library treats its supplied input as UTF-8.

    My locale settings are:

    LANG=en_AU.UTF-8
    LC_CTYPE="en_AU.UTF-8"
    LC_NUMERIC="en_AU.UTF-8"
    LC_TIME="en_AU.UTF-8"
    LC_COLLATE="en_AU.UTF-8"
    LC_MONETARY="en_AU.UTF-8"
    LC_MESSAGES="en_AU.UTF-8"
    LC_PAPER="en_AU.UTF-8"
    LC_NAME="en_AU.UTF-8"
    LC_ADDRESS="en_AU.UTF-8"
    LC_TELEPHONE="en_AU.UTF-8"
    LC_MEASUREMENT="en_AU.UTF-8"
    LC_IDENTIFICATION="en_AU.UTF-8"
    LC_ALL=

    --
    -----------------------------------------------------------
    Paul "TBBle" Hampson, MCSE
    7th year CompSci/Asian Studies student, ANU
    The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
    [email protected]

    "No survivors? Then where do the stories come from I wonder?"
    -- Capt. Jack Sparrow, "Pirates of the Caribbean"

    This email is licensed to the recipient for non-commercial
    use, duplication and distribution. -----------------------------------------------------------

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.5 (GNU/Linux)

    iD8DBQFBJsztexDuohKLFuARAhw1AJ4tBAS7XK8b7y5YoEJkNrFdOeZhNACghECF xSE3r1fGVSEDjRmehT+freE=
    =VK8I
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)