Interestingly, I note that strtoul() accepts strings that begin with a sign (+ or -). This is odd, since you'd (*) think that a sign (particularly, a minus) would be a syntax error in parsing for an unsigned value.
Further, although the (Linux) man page is more than a bit murky on the subject, it seems that the result of parsing, say, "-1", with strtoul() is the largest unsigned value (usually, 2**N-1 or a lot of F's (in hex)). Whereas, I would expect it to be 1 (i.e., just take the absolute value).
Comments? I find this all very counterintuitive.
(*) Or should I say, "one would" ?
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.
Interestingly, I note that strtoul() accepts strings that begin with a sign >(+ or -). This is odd, since you'd (*) think that a sign (particularly, a >minus) would be a syntax error in parsing for an unsigned value.
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.
Interestingly, I note that strtoul() accepts strings that begin with a sign (+ or -). This is odd, since you'd (*) think that a sign (particularly, a minus) would be a syntax error in parsing for an unsigned value.
Further, although the (Linux) man page is more than a bit murky on the subject, it seems that the result of parsing, say, "-1", with strtoul() is the largest unsigned value (usually, 2**N-1 or a lot of F's (in hex)). Whereas, I would expect it to be 1 (i.e., just take the absolute value).
Comments? I find this all very counterintuitive.
(*) Or should I say, "one would" ?
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.
If strtol didn't exist today, making it necessary to invent it or
something like it, that function should use the intmax_t type.
Then there wouldn't be any need to add new variants going forward.
Interestingly, I note that strtoul() accepts strings that begin with a sign >(+ or -). This is odd, since you'd (*) think that a sign (particularly, a >minus) would be a syntax error in parsing for an unsigned value.
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of these functions and storing the result in an object of the smaller type.
Or check for range and handle out of range values as appropriate by situation.
In article <v51d1l$2fklr$[email protected]>,
2) Because it means that the two functions are literally the same
code. Both calculate the same bit pattern - the difference is only in
the caller's interpretation of the result.
Yeah, now I get it. You really only need strtoimax() and strtoumax().
A result of any smaller type can be obtained by calling one of these functions and storing the result in an object of the smaller type.
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.
Or check for range and handle out of range values as appropriate by
situation.
BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.
Or check for range and handle out of range values as appropriate by
situation.
BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.
SuS defines ERANGE as the errno returned if the converted value is out of range.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/strtoull.html
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.
Or check for range and handle out of range values as appropriate by
situation.
BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and strtoumax().
Which are? uunfortunately, not part of C standard.
On 6/21/24 11:53, Michael S wrote:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
They have been part of the C standard since C99.
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.
strto[u]l[l] are declared in <stdlib.h> strtoimax and strtoumax are
declared in <inttypes.h>, which can make them easy to miss.
It should be quite clear what is stored at endptr in all cases from the
POSIX description.
On Fri, 21 Jun 2024 16:54:33 GMT, Scott Lurndal wrote:
It should be quite clear what is stored at endptr in all cases from the
POSIX description.
You really need to be checking the C spec, just in case.
Ben Bacarisse <[email protected]> writes:
Michael S <[email protected]> writes:[...]
Which are? uunfortunately, not part of C standard.
Not sure if that '?' is just a typo. Anyway, yes they are both
part of the C standard.
strto[u]l[l] are declared in <stdlib.h> strtoimax and strtoumax are
declared in <inttypes.h>, which can make them easy to miss.
On 6/21/24 11:53, Michael S wrote:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
They have been part of the C standard since C99.
BTW, I don't know what The Standard says about out-of-range inputs,
but at least https://en.cppreference.com/w/c/string/byte/strtol
does not say anything certain. especially about what stored in
*str_end.
"The strtoimax and strtoumax functions are equivalent to the strtol,
strtoll, strtoul, and strtoull functions, except that the initial
portion of the string is converted to intmax_t and uintmax_t
representation, respectively." (7.8.2.3p2)
You need to go to the descriptions of those other functions to get the detailed specifications.
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any),
and the value of the macro ERANGE is stored in errno."
As I understand it, that means that if the input string represents a
value outside of the range of representable values, then strtoimax()
should return INTMAX_MIN or INTMAX_MAX, depending upon the sign, and strtouimax() should return UINTMAX_MAX. Both of them should store the
value of ERANGE in errno, to distinguish these results from what you
would get if the string happened to represent those values.
The C standard uses end_ptr rather than str_end in it's description of
these functions.
"... First, they decompose the input string into three parts: an
initial, possibly empty, sequence of white-space characters, a subject sequence resembling an integer represented in some radix determined by
the value of base, and a final string of one or more unrecognized
characters, including the terminating null character of the input
string. ..." (7.21.4.7p2).
That defines what the "final string" is.
"If the subject sequence has the expected form, ... A pointer to the
final string is stored in the object pointed to by endptr, provided
that endptr is not a null pointer." (7.24.1.7p5).
"If the subject sequence is empty or does not have the expected form
... the value of nptr is stored in the object pointed to by endptr,
provided that endptr is not a null pointer." (7.21.4.7p7)
That seems very precise and unambiguous to me, aside from what "the
expected form" is, which is described elsewhere.
In article <v51d1l$2fklr$[email protected]>,
Kenny McCormack <[email protected]> wrote:
Interestingly, I note that strtoul() accepts strings that begin with a sign >>(+ or -). This is odd, since you'd (*) think that a sign (particularly, a >>minus) would be a syntax error in parsing for an unsigned value.
There have been some useful responses on this thread, which is Good. Of course, there have also been the usual crappola-type responses, but one must learn to take the good with the bad.
Anyway, I think the takeaway is that while it is what it is, an argument
can certainly be made that it would have been better for the unsigned versions of these function to not accept signed input. If I were designing it, I would have had strtoul("-1") be a syntax error (not a C language
syntax error - but a meta-language syntax error) - or, if not that, then
have it return 1, not 2**N-1. But that's just me.
I appreciate the responses indicating that it was probably done the way it was for actually both of these reasons:
1) Because it makes it more useful for C compiler writers - who were
seen as the primary audience.
2) Because it means that the two functions are literally the same code.
Both calculate the same bit pattern - the difference is only in the
caller's interpretation of the result.
Lawrence D'Oliveiro <[email protected]d> writes:
On Fri, 21 Jun 2024 16:54:33 GMT, Scott Lurndal wrote:
It should be quite clear what is stored at endptr in all cases from the
POSIX description.
You really need to be checking the C spec, just in case.
No, I don't.
Lawrence D'Oliveiro <[email protected]d> writes:...
You really need to be checking the C spec, just in case.
No, I don't. The posix document clearly states that the text
is from ISO C (and clearly marks any extensions).
But frankly, I expected that cppreference.com will do better.
In article <v54hc0$39bpi$[email protected]>,
James Kuyper <[email protected]> wrote:
On 6/21/24 11:53, Michael S wrote:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
They have been part of the C standard since C99.
To some people, "Standard C" means C89.
Everything after that is, like POSIX, just fluffy nonsense.
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.
Or check for range and handle out of range values as appropriate by
situation.
BTW, I don't know what The Standard says about out-of-range inputs,
but at least https://en.cppreference.com/w/c/string/byte/strtol
does not say anything certain. especially about what stored in
*str_end.
It says what value should be returned. That's something certain!
On Fri, 21 Jun 2024 18:15:07 +0100
Ben Bacarisse <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.
Or check for range and handle out of range values as appropriate by
situation.
BTW, I don't know what The Standard says about out-of-range inputs,
but at least https://en.cppreference.com/w/c/string/byte/strtol
does not say anything certain. especially about what stored in
*str_end.
It says what value should be returned. That's something certain!
In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.
Also, at least to me, Standard text itself appear very far from clear
and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual implementation that I tested behaves in this manner.
It seems, the problem is of what is considered "range of representable
values" for unsigned type is by itself open to interpretations.
IMHO, even if in some part of the standard there exists text that
clearly states that "range of representable values for unsigned long = [-ULONG_MAX:ULONG_MAX]" it is worth repeating that in the section that defines strtol, because it is at all non-intuitive.
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:15:07 +0100
Ben Bacarisse <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.
Or check for range and handle out of range values as
appropriate by situation.
BTW, I don't know what The Standard says about out-of-range
inputs, but at least
https://en.cppreference.com/w/c/string/byte/strtol does not say
anything certain. especially about what stored in *str_end.
It says what value should be returned. That's something certain!
In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.
Ah. What's the discrepancy you see?
Also, at least to me, Standard text itself appear very far from
clear and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual implementation that I tested behaves in this manner.
I don't get that from the text. There is, after all, no "negative
input". There is a "subject sequence" which, if it starts with a
minus sign, causes the "value resulting from the conversion is
negated (in the return type)" which seems clear enough.
It seems, the problem is of what is considered "range of
representable values" for unsigned type is by itself open to interpretations.
IMHO, even if in some part of the standard there exists text that
clearly states that "range of representable values for unsigned
long = [-ULONG_MAX:ULONG_MAX]" it is worth repeating that in the
section that defines strtol, because it is at all non-intuitive.
I don't get what you are saying here. The range of values is
[0:ULONG_MAX].
On Sun, 23 Jun 2024 12:38:51 +0100
Ben Bacarisse <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:15:07 +0100
Ben Bacarisse <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 21 Jun 2024 18:28:39 +0300
Michael S <[email protected]> wrote:
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)
[email protected] (Kenny McCormack) wrote:
Yeah, now I get it. You really only need strtoimax() and
strtoumax().
Which are? uunfortunately, not part of C standard.
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.
Or check for range and handle out of range values as
appropriate by situation.
BTW, I don't know what The Standard says about out-of-range
inputs, but at least
https://en.cppreference.com/w/c/string/byte/strtol does not say
anything certain. especially about what stored in *str_end.
It says what value should be returned. That's something certain!
In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.
Ah. What's the discrepancy you see?
IMHO, the Standard texts allows for more interpretations (and misinterpretations) than cppreference.com text
Also, at least to me, Standard text itself appear very far from
clear and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual
implementation that I tested behaves in this manner.
I don't get that from the text. There is, after all, no "negative
input". There is a "subject sequence" which, if it starts with a
minus sign, causes the "value resulting from the conversion is
negated (in the return type)" which seems clear enough.
I find it less than clear.
The most non-clear part is that for strtouxx() as long as "subject
sequence" is in range,
it is first converted and then negated. However
when "subject sequence" is out of range it is converted, then clipped
and then *not* negated.
Michael S <[email protected]> writes:
As I've used these functions for
decades, I find it hard to see where the alternative interpretations
might lie.
Interestingly, I note that strtoul() accepts strings that begin with a
sign (+ or -). This is odd, since you'd (*) think that a sign
(particularly, a minus) would be a syntax error in parsing for an
unsigned value.
Further, although the (Linux) man page is more than a bit murky on the subject, it seems that the result of parsing, say, "-1", with
strtoul() is the largest unsigned value (usually, 2**N-1 or a lot of
F's (in hex)). Whereas, I would expect it to be 1 (i.e., just take
the absolute value).
Comments? I find this all very counterintuitive.
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's happening.
Ben Bacarisse <[email protected]> writes:
[range questions for strtol(), etc]
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.
I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.
Tim Rentsch <[email protected]> writes:
Ben Bacarisse <[email protected]> writes:
[range questions for strtol(), etc]
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.
I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.
Can you give an example where the specified behavior causes undefined behavior?
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's happening.
I don't want to pre-empt Tim's answer, but the wording that bothers me
is
"If the subject sequence begins with a minus sign, the value
resulting from the conversion is negated (in the return type)."
For strtoll("-9223372036854775808", 0, 0) the value resulting from the conversion is 9223372036854775808 which can not even be represented in
the return type, so how can it be negated "in the return type"?
If the negation, which is a positive value, cannot be represented in the type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.
Tim Rentsch <[email protected]> writes:
Ben Bacarisse <[email protected]> writes:
[range questions for strtol(), etc]
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.
I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.
I think you're both overthinking it.
Tim Rentsch <[email protected]> writes:
Ben Bacarisse <[email protected]> writes:
[range questions for strtol(), etc]
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.
I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.
Can you give an example where the specified behavior causes undefined behavior?
Kaz Kylheku <[email protected]> writes:
On 2024-06-24, Kaz Kylheku <[email protected]> wrote:
If the negation, which is a positive value, cannot be represented in the >>> type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.
Errr, what am I saying! The negation, which is a negative value,
cannot be represented in the type, so the required behavior is to
return LLONG_MIN and set errno to negative.
You mean "and set errno to ERANGE".
There's still some ambiguity for strtoull("-9999999999999999999",
NULL, 10) (that's well outside the range of a 64-bit integer). For
that to work as expected, we have to assume that the determination
that "the correct value is outside the range of representable values"
happens *before* the negation "is performed in the return type".
It's not clear that this problem is worth fixing (doing so would
likely make that section longer and perhaps more confusing).
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 716 |
| Nodes: | 16 (3 / 13) |
| Uptime: | 53:07:13 |
| Calls: | 12,116 |
| Calls today: | 7 |
| Files: | 15,010 |
| Messages: | 6,518,599 |
| Posted today: | 2 |