Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

locale/LC_CTYPE vs strcasecmp?

From Winston@21:1/5 to All on Tue Mar 26 06:24:31 2024

In FreeBSD 14.0-RELEASE:

The man page says strcasecmp_l() takes an explicit locale.
The implication is that strcasecmp() uses the current locale
(presumably as set by setlocale()).

After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
strcasecmp() is not, in fact, case-independently matching non-ASCII
UTF-8 strings: it's case sensitive (the ASCII equivalent in this
case being that "Abc" isn't matching "abc").

Is that a bug, does strcasecmp not, in fact, use the current
locale, or am I missing something?

TIA,
-WBE

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christian Weisgerber@21:1/5 to Winston on Tue Mar 26 19:47:03 2024

On 2024-03-26, Winston <[email protected]d> wrote:

The man page says strcasecmp_l() takes an explicit locale.
The implication is that strcasecmp() uses the current locale
(presumably as set by setlocale()).

Yes.
src/lib/libc/string/strcasecmp.c:

57 int
58 strcasecmp(const char *s1, const char *s2)
59 {
60 return strcasecmp_l(s1, s2, __get_locale());
61 }

After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
strcasecmp() is not, in fact, case-independently matching non-ASCII
UTF-8 strings: it's case sensitive (the ASCII equivalent in this
case being that "Abc" isn't matching "abc").

UTF-8 characters are multibyte. You need to convert the strings
to wide characters and use wcscasecmp().

--
Christian "naddy" Weisgerber [email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Winston@21:1/5 to kindly on Wed Mar 27 11:16:40 2024

I originally posted:

The man page says strcasecmp_l() takes an explicit locale.
The implication is that strcasecmp() uses the current locale
(presumably as set by setlocale()).

to which Christian Weisgerber <[email protected]> kindly replied:

Yes.
src/lib/libc/string/strcasecmp.c:

57 int
58 strcasecmp(const char *s1, const char *s2)
59 {
60 return strcasecmp_l(s1, s2, __get_locale());
61 }

:-)

After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
strcasecmp() is not, in fact, case-independently matching non-ASCII
UTF-8 strings: it's case sensitive (the ASCII equivalent in this
case being that "Abc" isn't matching "abc").

UTF-8 characters are multibyte. You need to convert the strings
to wide characters and use wcscasecmp().

As one would expect and perfectly reasonable, but something (I forget
what now) led me to think that if strcasecmp accepted UTF-8 locales,
maybe it *would* be willing to, just operating one byte at a time
instead of two.

Thanks for confirming that, Christian. Onward to upgrading this
code that should have been doing that already ...
-WBE

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet
- Zenobyte
  Wed Jul 29 21:08:05 2026
  from San Juan, Pr via Telnet
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	76:12:20
Calls:	12,450
Calls today:	5
Files:	15,194
Messages:	6,537,666