Forum: >>> Magnum BBS <<<

"undefined behavior"?

From DFS@21:1/5 to All on Wed Jun 12 16:47:23 2024

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Barry Schwarz@21:1/5 to DFS on Wed Jun 12 14:30:26 2024

On Wed, 12 Jun 2024 16:47:23 -0400, DFS <[email protected]> wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one >anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

Since we cannot see your code, I will guess that you use a non-zero
value in outliers[i] to indicate that the corresponding value has been identified as an outlier. Since you did not initialize the array
outliers, you have no idea what indeterminate value any element of the
array contains when your program begins execution. Apparently some of
them are non-zero. The fact that the first 40 are zero and the
remaining non-zero is merely an artifact of how your system builds
this particular program with that particular set of compile and link
options. Change anything and you could see completely different
behavior, or not.

I don't use gcc but, in debug mode, some compilers will put
recognizable "garbage values" in uninitialized variables so you can
spot the condition more easily.

In any case, the C language does not prevent you from shooting
yourself in the foot if you choose to. Evaluating an indeterminate
value is one fairly common way to do this.

--
Remove del for email

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to DFS on Wed Jun 12 23:38:45 2024

On 12/06/2024 22:47, DFS wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

It is /really/ difficult to know exactly what your problem is without
seeing your C code! There may be other problems that you haven't seen yet.

Non-static local variables without initialisers have "indeterminate"
value if there is no initialiser. Trying to use these "indeterminate"
values is undefined behaviour - you have absolutely no control over what
might happen. Any particular behaviour you see is done to luck from the
rest of the code and what happened to be in memory at the time.

There is no automatic initialisation of non-static local variables,
because that would often be inefficient. The best way to avoid errors
like yours, IMHO, is not to declare such variables until you have data
to put in them - thus you always have a sensible initialiser of real
data. Occasionally that is not practical, but it works in most cases.

For a data array, zero initialisation is common. Typically you do this
with :

int xs[100] = { 0 };

That puts the explicit 0 in the first element of xs, and then the rest
of the array is cleared with zeros.

I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

A major lesson here is to learn how to use your tools. C is not a
forgiving language. Make use of all the help your tools can give you -
enable warnings here. "gcc -Wall" enables a range of common warnings
with few false positives in normal well-written code, including ones
that check for attempts to read uninitialised data. "-Wextra" enables a
slew of extra warnings. Some of these will annoy people and trigger on
code they find reasonable, while most are good choices for a lot of code
- but personal preference varies significantly. And remember to enable optimisation, since it makes the static checking more powerful.

If you /really/ want gcc to zero out such local data automatically, use "-ftrivial-auto-var-init=zero". But it is much better to use warnings
and write correct code - options like that one are an addition to
well-checked code for paranoid software in security-critical contexts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to DFS on Wed Jun 12 23:38:55 2024

On 12.06.2024 22:47, DFS wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

Yeah, I had a similar problem like you; I had a declaration

char answer[100];

and was surprised that it wasn't initialized with "42".

Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Barry Schwarz on Wed Jun 12 17:53:35 2024

On 6/12/2024 5:30 PM, Barry Schwarz wrote:

On Wed, 12 Jun 2024 16:47:23 -0400, DFS <[email protected]> wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

I HATE bogus answers like this.

Aren't you embarrassed to say things like that?

Since we cannot see your code, I will guess that you use a non-zero
value in outliers[i] to indicate that the corresponding value has been identified as an outlier.

No.

I compare the data point to the lower and upper bounds of a stat rule
commonly called the "IQR Rule":

lo = Q1 - (1.5 * IQR)
hi = Q3 + (1.5 * IQR)

If it falls outside the range of lo-hi I strcat the value to a char.

The outlier routine starts line 170.

If you change

char outliers[200]="", temp[10]="";
to
char outliers[200], temp[10];

you might see what happens when you run the program for consecutive values:

$ ./prog 100 -c

=========================================================================

//this code is hereby released to the public domain

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>

/*
this program computes the descriptive statistics of a randomly
generated set of N integers

1.0 release Dec 2020
2.0 release Jun 2024

used the population skewness and Kurtosis formulas from:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
also test the results of this code against that site

compile: gcc -Wall prog.c -o prog -lm
usage : ./prog N -option (where N is 2 or higher, and option is -r or
-c or -o)
-r generates N random numbers
-c generates consecutive numbers 1 to N
-o generates random numbers with outliers
*/

//random ints
int randNbr(int low, int high) {
return (low + rand() / (RAND_MAX / (high - low + 1) + 1));
}

//comparator function used with qsort
int compareint (const void * a, const void * b)
{
if (*(int*)a > *(int*)b) return 1;
else if (*(int*)a < *(int*)b) return -1;
else return 0;
}

int main(int argc, char *argv[])
{
if(argc < 3) {
printf("Missing argument:\n");
printf(" * enter a number greater than 2\n");
printf(" * enter an option -r -c or -o\n");
exit(0);
}

//vars
int i=0, lastmode=0;
int N = atoi(argv[1]);
int nums[N];
//int *nums = malloc(N * sizeof(int));

double sumN=0.0, median=0.0, Q1=0.0, Q2=0.0, Q3=0.0, IQR=0.0;
double stddev = 0.0, kurtosis = 0.0;
double sqrdiffmean = 0.0, cubediffmean = 0.0, quaddiffmean = 0.0;
double meanabsdev = 0.0, rootmeansqr = 0.0;
char mode[100], tmp[12];

//generate random dataset
if(strcmp(argv[2],"-r") == 0) {
srand(time(NULL));
for(i=0;i<N;i++) { nums[i] = randNbr(1,N*3); }

printf("%d Randoms:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate random dataset with outliers
if(strcmp(argv[2],"-o") == 0) {
srand(time(NULL));
nums[0] = 1; nums[1] = 3;
for(i=2;i<N-2;i++) { nums[i] = randNbr(100,N*30); }
nums[N-2] = 1000; nums[N-1] = 2000;

printf("%d Randoms with outliers:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate consecutive numbers 1 to N
if(strcmp(argv[2],"-c") == 0) {
for(i=0;i<N;i++) { nums[i] = i + 1; }

printf("%d Consecutive:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//various
for(i=0;i<N;i++) {sumN += nums[i];}
double min = nums[0], max = nums[N-1];

//calc descriptive stats
double mean = sumN / (double)N;
int ucnt = 1, umaxcnt=1;
for(i = 0; i < N; i++)
{
sqrdiffmean += pow(nums[i] - mean, 2); // for variance and sum squares
cubediffmean += pow(nums[i] - mean, 3); // for skewness
quaddiffmean += pow(nums[i] - mean, 4); // for Kurtosis
meanabsdev += fabs((nums[i] - mean)); // for mean absolute deviation
rootmeansqr += nums[i] * nums[i]; // for root mean square

//mode
if(ucnt == umaxcnt && lastmode != nums[i])
{
sprintf(tmp,"%d ",nums[i]);
strcat(mode,tmp);
}

if(nums[i]-nums[i+1]!=0) {ucnt=1;} else {ucnt++;}

if(ucnt>umaxcnt)
{
umaxcnt=ucnt;
memset(mode, '\0', sizeof(mode));
sprintf(tmp, "%d ", nums[i]);
strcat(mode, tmp);
lastmode = nums[i];
}
}

// median and quartiles
// quartiles divide sorted dataset into four sections
// Q1 = median of values less than Q2
// Q2 = median of the data set
// Q3 = median of values greater than Q2
if(N % 2 == 0) {
Q2 = median = (nums[(N/2)-1] + nums[N/2]) / 2.0;
i = N/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + ((i-1)/2)] + nums[i+(i/2)]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i-1)/2)];
}
}

if(N % 2 != 0) {
Q2 = median = nums[(N-1)/2];
i = (N-1)/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + (i/2)] + nums[i + (i/2) + 1]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i+1)/2)];
}
}

// outliers: below Q1−1.5xIQR, or above Q3+1.5xIQR
IQR = Q3 - Q1;
char outliers[200]="", temp[10]="";
if (N > 3) {

//range for outliers
double lo = Q1 - (1.5 * IQR);
double hi = Q3 + (1.5 * IQR);

//no outliers
if ( min > lo && max < hi) {
strcat(outliers,"none (using IQR * 1.5 rule)");
}

//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';
strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}
outliers[strlen(outliers)] = '\0';
}

stddev = sqrt(sqrdiffmean/N);
kurtosis = quaddiffmean / (N * pow(sqrt(sqrdiffmean/N),4));

//output
printf("\n--------------------------------------------------------------\n");
printf("Minimum = %.0f\n", min);
printf("Maximum = %.0f\n", max);
printf("Range = %.0f\n", max - min);
printf("Size N = %d\n" , N);
printf("Sum N = %.0f\n", sumN);
printf("Mean μ = %.2f\n", mean);
printf("Median = %.1f\n", median);
if(umaxcnt > 1) {
printf("Mode(s) = %s (%d occurrences ea)\n", mode,umaxcnt);}
if(umaxcnt < 2) {
printf("Mode(s) = na (no repeating values)\n");}
printf("Std Dev σ = %.4f\n", stddev);
printf("Variance σ^2 = %.4f\n", sqrdiffmean/N);
printf("Mid Range = %.1f\n", (max + min)/2);
printf("Quartiles");
if(N > 3) {printf(" Q1 = %.1f\n", Q1);}
if(N < 4) {printf(" Q1 = na\n");}
printf(" Q2 = %.1f (median)\n", Q2);
if(N > 3) {printf(" Q3 = %.1f\n", Q3);}
if(N < 4) {printf(" Q3 = na\n");}
printf("IQR = %.1f (interquartile range)\n", IQR);
if(N > 3) {printf("Outliers = %s\n", outliers);}
if(N < 4) {printf("Outliers = na\n");}
printf("Sum Squares SS = %.2f\n", sqrdiffmean);
printf("MAD = %.4f (mean absolute deviation)\n", meanabsdev / N);
printf("Root Mean Sqr = %.4f\n", sqrt(rootmeansqr / N));
printf("Std Error Mean = %.4f\n", stddev / sqrt(N));
printf("Skewness γ1 = %.4f\n", cubediffmean / (N * pow(sqrt(sqrdiffmean/N),3)));
printf("Kurtosis β2 = %.4f\n", kurtosis);
printf("Kurtosis Excess α4 = %.4f\n", kurtosis - 3);
printf("CV = %.6f (coefficient of variation\n", sqrt(sqrdiffmean/N) / mean);
printf("RSD = %.4f%% (relative std deviation)\n", 100 * (sqrt(sqrdiffmean/N) / mean));
printf("--------------------------------------------------------------\n");
printf("Check results against\n");
printf("https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php");
printf("\n\n");

//free(nums);
return(0);
}

=========================================================================

Since you did not initialize the array
outliers, you have no idea what indeterminate value any element of the
array contains when your program begins execution. Apparently some of
them are non-zero. The fact that the first 40 are zero and the
remaining non-zero is merely an artifact of how your system builds
this particular program with that particular set of compile and link
options. Change anything and you could see completely different
behavior, or not.

I don't use gcc but, in debug mode, some compilers will put
recognizable "garbage values" in uninitialized variables so you can
spot the condition more easily.

In any case, the C language does not prevent you from shooting
yourself in the foot if you choose to. Evaluating an indeterminate
value is one fairly common way to do this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Keith Thompson on Wed Jun 12 18:34:22 2024

On 6/12/2024 6:22 PM, Keith Thompson wrote:

Janis Papanagnou <[email protected]> writes:

On 12.06.2024 22:47, DFS wrote:

[...]

before: char outliers[100];
after : char outliers[100] = "";

[...]

Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

This:
char outliers[100] = "";
initializes all 100 elements to zero. So does this:
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

If you want to set an array's 0th element to 0 and not waste time initializing the rest, you can assign it separately:
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

Thanks. I'll have to remember these things. I like to use char arrays.

The problem is I don't use C very often, so I don't develop muscle memory.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Keith Thompson on Wed Jun 12 19:07:29 2024

On 6/12/2024 6:30 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

On 6/12/2024 5:30 PM, Barry Schwarz wrote:

On Wed, 12 Jun 2024 16:47:23 -0400, DFS <[email protected]> wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values >>>>> 40 are flagged as outliers. Up to 40, no problem. Random numbers
dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using >>>> the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point >>>> 41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you? >>> Makes perfect sense. The first rule of undefined behavior is

"Whatever happens is exactly correct." You are not entitled to any
expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

I HATE bogus answers like this.

Aren't you embarrassed to say things like that?

He has nothing to be embarrassed about. What he wrote is correct.

No it's not.

"Whatever happens is exactly correct." is nonsense.

"You are not entitled to any expectations" is nonsense.

The C standard's definition of "undefined behavior" is "behavior, upon
use of a nonportable or erroneous program construct or of erroneous
data, for which this International Standard imposes no requirements".

If you don't like the way C deals with undefined behavior, that's
perfectly valid, and a lot of people are likely to agree with you.

Thanks for feeling my pain!

It's frustrating. By now I spent a half-hour dealing with it. gcc
could've just filled the char[] variable with 0s by default. I bet that
would save a LOT of people time and headaches.

But I advise against lashing out at people who are correctly explaining
what the C standard says.

The C standard really says "Whatever happens is exactly correct."?

DFS, since you've been posting in comp.lang.c for at least ten years,

Time flies.

How do you know I've posted here that long?

I'm surprised you're having difficulties with this.

I'm surprised at some of the wonkiness of gcc and C.

* warns relentlessly when the printf specifier doesn't match the var
type, but gives no warning when you use an int with memset (instead of
the size_t specified in the function prototype).

* a missing bracket } throws 50 nonsensical compiler errors.

* warns of unused vars but not uninitialized ones

* one uninitialized var makes your program do crazy things. Worse than
crazy is it's identically crazy each time.

./prog 40 -c
outliers: none

./prog 41 -c
outliers: 41

./prog 42 -c
outliers: 41 42

./prog 43 -c
outliers: 41 42 43

./prog 44 -c
outliers: 41 42 43 44

etc. And none were outliers - not even close.

At least if it showed nonsense data it would be easier to track down.
Maybe.

The thing is, none of those values (40+) were ever in that char[] prior
to running the code for a set of 50 consecutive values.

And I edited/compiled the code many times, but still got the identical
error.

I doubt my environment (gcc 11.4 on Windows Subsys for Linux on Ubuntu)
has anything to do with it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to David Brown on Wed Jun 12 18:29:27 2024

On 6/12/2024 5:38 PM, David Brown wrote:

On 12/06/2024 22:47, DFS wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all
values > 40 are flagged as outliers. Up to 40, no problem. Random
numbers dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers
(using the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data
point 41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

It is /really/ difficult to know exactly what your problem is without
seeing your C code! There may be other problems that you haven't seen yet.

The outlier section starts on line 169 =====================================================================================

//this code is hereby released to the public domain

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>

/*
this program computes the descriptive statistics of a randomly
generated set of N integers

1.0 release Dec 2020
2.0 release Jun 2024

used the population skewness and Kurtosis formulas from:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
also test the results of this code against that site

compile: gcc -Wall prog.c -o prog -lm
usage : ./prog N -option (where N is 2 or higher, and option is -r or
-c or -o)
-r generates N random numbers
-c generates consecutive numbers 1 to N
-o generates random numbers with outliers
*/

//random ints
int randNbr(int low, int high) {
return (low + rand() / (RAND_MAX / (high - low + 1) + 1));
}

//comparator function used with qsort
int compareint (const void * a, const void * b)
{
if (*(int*)a > *(int*)b) return 1;
else if (*(int*)a < *(int*)b) return -1;
else return 0;
}

int main(int argc, char *argv[])
{
if(argc < 3) {
printf("Missing argument:\n");
printf(" * enter a number greater than 2\n");
printf(" * enter an option -r -c or -o\n");
exit(0);
}

//vars
int i=0, lastmode=0;
int N = atoi(argv[1]);
int nums[N];

double sumN=0.0, median=0.0, Q1=0.0, Q2=0.0, Q3=0.0, IQR=0.0;
double stddev = 0.0, kurtosis = 0.0;
double sqrdiffmean = 0.0, cubediffmean = 0.0, quaddiffmean = 0.0;
double meanabsdev = 0.0, rootmeansqr = 0.0;
char mode[100], tmp[12];

//generate random dataset
if(strcmp(argv[2],"-r") == 0) {
srand(time(NULL));
for(i=0;i<N;i++) { nums[i] = randNbr(1,N*3); }

printf("%d Randoms:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate random dataset with outliers
if(strcmp(argv[2],"-o") == 0) {
srand(time(NULL));
nums[0] = 1; nums[1] = 3;
for(i=2;i<N-2;i++) { nums[i] = randNbr(100,N*30); }
nums[N-2] = 1000; nums[N-1] = 2000;

printf("%d Randoms with outliers:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate consecutive numbers 1 to N
if(strcmp(argv[2],"-c") == 0) {
for(i=0;i<N;i++) { nums[i] = i + 1; }

printf("%d Consecutive:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//various
for(i=0;i<N;i++) {sumN += nums[i];}
double min = nums[0], max = nums[N-1];

//calc descriptive stats
double mean = sumN / (double)N;
int ucnt = 1, umaxcnt=1;
for(i = 0; i < N; i++)
{
sqrdiffmean += pow(nums[i] - mean, 2); // for variance and sum squares
cubediffmean += pow(nums[i] - mean, 3); // for skewness
quaddiffmean += pow(nums[i] - mean, 4); // for Kurtosis
meanabsdev += fabs((nums[i] - mean)); // for mean absolute deviation
rootmeansqr += nums[i] * nums[i]; // for root mean square

//mode
if(ucnt == umaxcnt && lastmode != nums[i])
{
sprintf(tmp,"%d ",nums[i]);
strcat(mode,tmp);
}

if(nums[i]-nums[i+1]!=0) {ucnt=1;} else {ucnt++;}

if(ucnt>umaxcnt)
{
umaxcnt=ucnt;
memset(mode, '\0', sizeof(mode));
sprintf(tmp, "%d ", nums[i]);
strcat(mode, tmp);
lastmode = nums[i];
}
}

// median and quartiles
// quartiles divide sorted dataset into four sections
// Q1 = median of values less than Q2
// Q2 = median of the data set
// Q3 = median of values greater than Q2
if(N % 2 == 0) {
Q2 = median = (nums[(N/2)-1] + nums[N/2]) / 2.0;
i = N/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + ((i-1)/2)] + nums[i+(i/2)]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i-1)/2)];
}
}

if(N % 2 != 0) {
Q2 = median = nums[(N-1)/2];
i = (N-1)/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + (i/2)] + nums[i + (i/2) + 1]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i+1)/2)];
}
}

// outliers: below Q1−1.5xIQR, or above Q3+1.5xIQR
IQR = Q3 - Q1;
char outliers[200]="", temp[10]="";
if (N > 3) {

//range for outliers
double lo = Q1 - (1.5 * IQR);
double hi = Q3 + (1.5 * IQR);

//no outliers
if ( min > lo && max < hi) {
strcat(outliers,"none (using IQR * 1.5 rule)");
}

//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';
strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}
outliers[strlen(outliers)] = '\0';
}

stddev = sqrt(sqrdiffmean/N);
kurtosis = quaddiffmean / (N * pow(sqrt(sqrdiffmean/N),4));

//output
printf("\n--------------------------------------------------------------\n");
printf("Minimum = %.0f\n", min);
printf("Maximum = %.0f\n", max);
printf("Range = %.0f\n", max - min);
printf("Size N = %d\n" , N);
printf("Sum N = %.0f\n", sumN);
printf("Mean μ = %.2f\n", mean);
printf("Median = %.1f\n", median);
if(umaxcnt > 1) {
printf("Mode(s) = %s (%d occurrences ea)\n", mode,umaxcnt);}
if(umaxcnt < 2) {
printf("Mode(s) = na (no repeating values)\n");}
printf("Std Dev σ = %.4f\n", stddev);
printf("Variance σ^2 = %.4f\n", sqrdiffmean/N);
printf("Mid Range = %.1f\n", (max + min)/2);
printf("Quartiles");
if(N > 3) {printf(" Q1 = %.1f\n", Q1);}
if(N < 4) {printf(" Q1 = na\n");}
printf(" Q2 = %.1f (median)\n", Q2);
if(N > 3) {printf(" Q3 = %.1f\n", Q3);}
if(N < 4) {printf(" Q3 = na\n");}
printf("IQR = %.1f (interquartile range)\n", IQR);
if(N > 3) {printf("Outliers = %s\n", outliers);}
if(N < 4) {printf("Outliers = na\n");}
printf("Sum Squares SS = %.2f\n", sqrdiffmean);
printf("MAD = %.4f (mean absolute deviation)\n", meanabsdev / N);
printf("Root Mean Sqr = %.4f\n", sqrt(rootmeansqr / N));
printf("Std Error Mean = %.4f\n", stddev / sqrt(N));
printf("Skewness γ1 = %.4f\n", cubediffmean / (N * pow(sqrt(sqrdiffmean/N),3)));
printf("Kurtosis β2 = %.4f\n", kurtosis);
printf("Kurtosis Excess α4 = %.4f\n", kurtosis - 3);
printf("CV = %.6f (coefficient of variation\n", sqrt(sqrdiffmean/N) / mean);
printf("RSD = %.4f%% (relative std deviation)\n", 100 * (sqrt(sqrdiffmean/N) / mean));
printf("--------------------------------------------------------------\n");
printf("Check results against\n");
printf("https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php");
printf("\n\n");

return(0);
}

=====================================================================================

Non-static local variables without initialisers have "indeterminate"
value if there is no initialiser. Trying to use these "indeterminate" values is undefined behaviour - you have absolutely no control over what might happen. Any particular behaviour you see is done to luck from the rest of the code and what happened to be in memory at the time.

In 2024 that's surprising. I can't be the only one to forget to
initialize a char[] variable.

There is no automatic initialisation of non-static local variables,
because that would often be inefficient.

It would've saved me half an hour of frustration.

Now I'm getting 'stack smashing detected' errors (after the program runs correctly) when using datasets of consecutive numbers.

hmmmm 2 issues in a row using consecutives - that's a clue!

The best way to avoid errors
like yours, IMHO, is not to declare such variables until you have data
to put in them - thus you always have a sensible initialiser of real
data. Occasionally that is not practical, but it works in most cases.

Data is definitely going in them: either the value 'none' or a list of
the outliers and some text.

For a data array, zero initialisation is common. Typically you do this
with :

int xs[100] = { 0 };

That puts the explicit 0 in the first element of xs, and then the rest
of the array is cleared with zeros.

I recommend never using "char" as a type unless you really mean a > character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

I did mean characters, limited to: 0-9a-zA-Z()

I think I'm using the char variable correctly.
sprintf(tempchar,"%d ",outlier);
strcat(char,tempchar);

A major lesson here is to learn how to use your tools. C is not a
forgiving language. Make use of all the help your tools can give you - enable warnings here. "gcc -Wall" enables a range of common warnings
with few false positives in normal well-written code, including ones
that check for attempts to read uninitialised data.

I always use -Wall, and I was using it here.

"-Wextra" enables a

slew of extra warnings. Some of these will annoy people and trigger on
code they find reasonable, while most are good choices for a lot of code
- but personal preference varies significantly. And remember to enable optimisation, since it makes the static checking more powerful.

Just did this:
gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm

and no warnings or errors at all.

But: it now aborts near the front when using consecutive data points
(but not randoms).

*** buffer overflow detected ***: terminated
Aborted

I'm actually happy about that. I should be able to find and fix it.

If you /really/ want gcc to zero out such local data automatically, use "-ftrivial-auto-var-init=zero". But it is much better to use warnings
and write correct code - options like that one are an addition to well-checked code for paranoid software in security-critical contexts.

Great answer! I can always count on D Brown for excellent advice.
Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Keith Thompson on Thu Jun 13 02:19:59 2024

On 13.06.2024 00:22, Keith Thompson wrote:

This:
char outliers[100] = "";
initializes all 100 elements to zero. So does this:
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Oops! This surprised me. (But you are right.) The overhead isn't [syntactically] obvious, but I'm anyway always setting a single
'\0' character if I want to store strings in a 'char[]' and have
it initialized to an empty string (like below).

If you want to set an array's 0th element to 0 and not waste time initializing the rest, you can assign it separately:
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

It wouldn't occur to me to use the strcpy() function, but is the
function call really that expensive in C ?

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ike Naar@21:1/5 to DFS on Thu Jun 13 07:25:58 2024

On 2024-06-12, DFS <[email protected]> wrote:

//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

strcat(outliers,"none (using IQR * 1.5 rule)");
}

//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';

This is unnecessary;
sprintf terminates the generated string with a null character.

strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to DFS on Thu Jun 13 10:43:51 2024

On 12/06/2024 21:47, DFS wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

I assume outliers is inside a function.

What are the 100 values of outliers if you don't initialise it? You can
try printing them out (as individual numbers not as a string) although
just doing that, and adding that extra code, may change the actual values.

However that doesn't matter if it still goes wrong; you may still get a
hint as to why it's behaving as it is.

Also, why doesn't gcc just do you a solid and initialize to "" for you?

Initialising to "" will zero the entire array. You really want the
compiler to do that work, even when you're going to overwrite it anyway?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Thu Jun 13 14:42:22 2024

On 13/06/2024 00:18, Keith Thompson wrote:

David Brown <[email protected]> writes:
[...]

I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

[...]

The implementation-definedness of plain char is awkward, but char
arrays generally work just fine for UTF-8 strings.

Yes, but "generally work" is not quite as strong as I would like. My preference for UTF-8 strings is a const unsigned char type (with C23, it
will be char8_t, which is defined to be the same type as "unsigned
char"). But u8"Hello, world" UTF-8 string literals (since C11) are
considered to be like an array of type "char" in C (until C23), so I
guess UTF-8 strings will be safe in plain char arrays. Still, the bytes
in a UTF-8 strings are code units with values between 0 and 255, so I
prefer to store these in a type that can hold that range of values.

(What happens if you have a platform that uses ones' complement
arithmetic, with "char" being signed and a range of -127 to +127, and
you have a u8"..." string which has a code unit of 0x80 that cannot be represented in "char" ? It's just a hypothetical question, of course.)

If char is
signed, byte values greater than 127 will be stored as negative
values, but it will almost certainly just work (if your system
is configured to handle UTF-8). Likewise for Latin-1 and similar
8-bit character sets.

The standard string functions operate on arrays of plain char, so
storing UTF-8 strings in arrays of uint8_t or unsigned char will
seriously restrict what you can do with them.

(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

I would also prefer that, but too much existing code relies on plain
char being signed on the platforms it runs on. I personally think the
idea of having signed or unsigned characters is a very poor choice of
names for the terms, but it's way too late to change that! C23 has
"char8_t" which is always unsigned.

(In C23, "char8_t" is defined in <uchar.h> and is the same type as
"unsigned char". In C++20, in contrast, "char8_t" is a keyword and a
distinct type with identical size and range to "unsigned char".)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Janis Papanagnou on Thu Jun 13 15:28:20 2024

On 13/06/2024 02:19, Janis Papanagnou wrote:

On 13.06.2024 00:22, Keith Thompson wrote:

This:
char outliers[100] = "";
initializes all 100 elements to zero. So does this:
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Oops! This surprised me. (But you are right.) The overhead isn't [syntactically] obvious, but I'm anyway always setting a single
'\0' character if I want to store strings in a 'char[]' and have
it initialized to an empty string (like below).

If you want to set an array's 0th element to 0 and not waste time
initializing the rest, you can assign it separately:
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

It wouldn't occur to me to use the strcpy() function, but is the
function call really that expensive in C ?

That depends on your toolchain.

If you are using a Windows-based compiler with an external DLL for the C library and the compiler doesn't handle the strcpy() directly, then it
can be quite a lot of overhead. You have the call to the DLL, which
involves a few steps of indirection. The library strcpy() may be
optimised for handling large strings, and may save and restore a lot of registers (such as SIMD vector registers).

If you are using a compiler (whatever the platform) that optimises
"strcpy", it will generate identical code to "outliers[0] = '\0';".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to DFS on Thu Jun 13 15:21:54 2024

On 13/06/2024 00:34, DFS wrote:

On 6/12/2024 6:22 PM, Keith Thompson wrote:

Janis Papanagnou <[email protected]> writes:

On 12.06.2024 22:47, DFS wrote:

[...]

before: char outliers[100];
after : char outliers[100] = "";

[...]

Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

This:
     char outliers[100] = "";
initializes all 100 elements to zero. So does this:
     char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Yes. It's good to point that out, since people might assume that using
a string literal here only initialises the bit covered by that string
literal.

(In C23 you can also write "char outliers[100] = {};" to get all zeros.)

If you want to set an array's 0th element to 0 and not waste time
initializing the rest, you can assign it separately:
     char outliers[100];
     outliers[0] = '\0';
or
     char outliers[100];
     strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

A good compiler will generate the same code for both cases - strcpy() is
often inlined for such uses.

Thanks. I'll have to remember these things. I like to use char arrays.

The problem is I don't use C very often, so I don't develop muscle memory.

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to DFS on Thu Jun 13 15:15:55 2024

On 13/06/2024 00:29, DFS wrote:

On 6/12/2024 5:38 PM, David Brown wrote:

On 12/06/2024 22:47, DFS wrote:

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all
values > 40 are flagged as outliers. Up to 40, no problem. Random
numbers dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers
(using the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data
point 41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

It is /really/ difficult to know exactly what your problem is without
seeing your C code! There may be other problems that you haven't seen
yet.

The outlier section starts on line 169 =====================================================================================

<snip>

Apart from the initialisation issue, I would suggest you re-consider the
way you add strings to the "outliers" buffer. If there are two many of
them, it will overflow - there's nothing to stop you putting more than
200 characters into it. I would recommend dropping the "temp" variable
and instead keep track of a pointer to the terminated null character of
your current "outliers" string. Use "snprintf" to "print" directly into
the string, rather than going via "temp", and use the return value of
the "snprintf" to update your end pointer. You will easily be able to
avoid the risk of overrun, while also being slightly more efficient too.

The line:

outliers[strlen(outliers)] = '\0';

is completely useless. "strlen" starts at the beginning of "outliers",
and counts along until it finds a null character - thus either "outliers[strlen(outliers)]" is already equal to '\0', or your attempt
at calculating "strlen" with an overrun buffer will lead to more
undefined behaviour.

Non-static local variables without initialisers have "indeterminate"
value if there is no initialiser. Trying to use these "indeterminate"
values is undefined behaviour - you have absolutely no control over
what might happen. Any particular behaviour you see is done to luck
from the rest of the code and what happened to be in memory at the time.

In 2024 that's surprising. I can't be the only one to forget to
initialize a char[] variable.

You are not - attempting to use an uninitialised variable is a common
error. That is why C compilers provide warnings about this kind of
thing, along with run-time tools like the sanitizers Ben recommended, to
help find such mistakes. But compiler vendors can't force people to use
such tools and warning flags, nor can the tools find /all/ cases of
errors. At some point, programmers have to take responsibility for
knowing the language they are using, and writing their code correctly.
Good tools and good use of those tools is an aid to careful coding, not
an alternative to it.

There is no automatic initialisation of non-static local variables,
because that would often be inefficient.

It would've saved me half an hour of frustration.

And the things you have learned as a result - from your own debugging,
and the threads here - will save you many more hours of frustration in
the future.

There are languages that focus on ease of use and do all the management
of things like strings and buffers, and prevent users from mistakes like
this, at the cost of slower run-times. There are languages that do very
little automatically for the programmer and have absolutely minimal
overheads, for maximal efficiency. C is the later kind of language.

Remember, while you might see automatic initialisation of local
variables as a negligible overhead, other people might not - I've worked
on C code for microcontrollers where a wasted processor cycle or two is
too much. If your code does not care about such efficiencies, then you
have to question whether C is the right language in the first place. I
believe most modern code that is written in C would be better if it were written in other higher level languages (precisely because a half hour
of /your/ time is usually more valuable than a few microseconds of your computer's time).

On the subject of initialisation, I strongly suggest that you do /not/
get in the habit of always initialising your variables to 0 when you
define them. Do that only if 0 is the real, appropriate starting value.
Prefer to avoid declaring the variable at all until you need it, then
define it with its initial value (and consider making it "const" to
reduce the risk of other coding errors). If the structure of the code
requires you to define the variable before you have a value for it,
prefer to leave it without an initial value. Then compiler warnings
have a much better chance of spotting mistakes.

Now I'm getting 'stack smashing detected' errors (after the program runs correctly) when using datasets of consecutive numbers.

I think Ben found that buffer overrun for you, and showed you how to
find it yourself in the future.

hmmmm 2 issues in a row using consecutives - that's a clue!

The best way to avoid errors like yours, IMHO, is not to declare such
variables until you have data to put in them - thus you always have a
sensible initialiser of real data. Occasionally that is not
practical, but it works in most cases.

Data is definitely going in them: either the value 'none' or a list of
the outliers and some text.

Now that I have your source code, I can see the error is the way you put
data in - strcat() reads the existing data, it does not just write data.

For a data array, zero initialisation is common. Typically you do
this with :

int xs[100] = { 0 };

That puts the explicit 0 in the first element of xs, and then the rest
of the array is cleared with zeros.

I recommend never using "char" as a type unless you really mean a >
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

I did mean characters, limited to: 0-9a-zA-Z()

OK.

I think I'm using the char variable correctly.
sprintf(tempchar,"%d ",outlier);
strcat(char,tempchar);

Yes. Without your source code, I could only guess.

But see earlier in this post for a suggestion to improve your use of the variable.

A major lesson here is to learn how to use your tools. C is not a
forgiving language. Make use of all the help your tools can give you
- enable warnings here. "gcc -Wall" enables a range of common
warnings with few false positives in normal well-written code,
including ones that check for attempts to read uninitialised data.

I always use -Wall, and I was using it here.

Good. Unfortunately, good though gcc is, it is not perfect. Improving warnings is a continuous endeavour for the gcc developers, but they
usually have to err on the side of avoiding false positives.

"-Wextra" enables a

slew of extra warnings. Some of these will annoy people and trigger
on code they find reasonable, while most are good choices for a lot of
code - but personal preference varies significantly. And remember to
enable optimisation, since it makes the static checking more powerful.

Just did this:
gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm

"-O3" is rarely much use - stick to "-O2" for normal use. The extra optimisations enabled by "-O3" help in some code, but work worse on
other code due to the increased size, so they should be used with care. Certainly "-O3" is rarely worth it unless you are also using a "-march="
flag (such as "-fmarch=native") to tune for a particular processor and
enable stuff like vectorisation. Getting the fastest code is more of an
art than a science!

and no warnings or errors at all.

But: it now aborts near the front when using consecutive data points
(but not randoms).

*** buffer overflow detected ***: terminated
Aborted

I'm actually happy about that. I should be able to find and fix it.

If you /really/ want gcc to zero out such local data automatically,
use "-ftrivial-auto-var-init=zero". But it is much better to use
warnings and write correct code - options like that one are an
addition to well-checked code for paranoid software in
security-critical contexts.

Great answer! I can always count on D Brown for excellent advice.
Thank you.

I try :-)

You get the best results by combing the advice from a variety of people
here, along with your own experimentations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Ike Naar on Thu Jun 13 11:13:04 2024

On 6/13/2024 3:25 AM, Ike Naar wrote:

On 2024-06-12, DFS <[email protected]> wrote:

//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

You're saying some outliers will not be flagged?

strcat(outliers,"none (using IQR * 1.5 rule)");
}

//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';

This is unnecessary;
sprintf terminates the generated string with a null character.

Thanks.

strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to David Brown on Thu Jun 13 10:38:12 2024

On 6/13/2024 9:21 AM, David Brown wrote:

On 13/06/2024 00:34, DFS wrote:

On 6/12/2024 6:22 PM, Keith Thompson wrote:

Janis Papanagnou <[email protected]> writes:

On 12.06.2024 22:47, DFS wrote:

[...]

before: char outliers[100];
after : char outliers[100] = "";

[...]

Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

This:
     char outliers[100] = "";
initializes all 100 elements to zero. So does this:
     char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Yes. It's good to point that out, since people might assume that using
a string literal here only initialises the bit covered by that string literal.

(In C23 you can also write "char outliers[100] = {};" to get all zeros.)

If you want to set an array's 0th element to 0 and not waste time
initializing the rest, you can assign it separately:
     char outliers[100];
     outliers[0] = '\0';
or
     char outliers[100];
     strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

A good compiler will generate the same code for both cases - strcpy() is often inlined for such uses.

Thanks. I'll have to remember these things. I like to use char arrays. >>
The problem is I don't use C very often, so I don't develop muscle
memory.

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

I like C for it's blazing speed. Very addicting. And it's much more challenging/frustrating than python.

I coded a subset (8 stat measures) of this C program 3.5 years ago, and recently decided to finish duplicating all 23 stats shown at:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

Working on the outliers code, I decided to add an option to generate
data with consecutive numbers. That's when I ran $./dfs 50 -c and
noticed every value above 40 was considered an outlier. And this didn't
change over a bunch of code edits/file saves/compiles.

Understanding how an uninitialized variable caused that persistent issue
is beyond my pay grade.

That's when I whined to clc. Before I even posted, though, I spotted
the uninitialized var (outliers). Later I spotted another one (mode).

One led to 'undefined behavior', the other to 'stack smashing'. Both
only occurred when using consecutive numbers.

But with y'all's help I believe I found and fixed ALL issues. I can
dream anyway.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Malcolm McLean on Thu Jun 13 15:39:25 2024

Malcolm McLean <[email protected]> writes:

On 13/06/2024 01:33, Keith Thompson wrote:

printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
by the format string. This:
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

We just can't have size_t variables swilling around in prgrams for these >reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lew Pitcher@21:1/5 to DFS on Thu Jun 13 15:49:46 2024

On Thu, 13 Jun 2024 11:13:04 -0400, DFS wrote:

On 6/13/2024 3:25 AM, Ike Naar wrote:

On 2024-06-12, DFS <[email protected]> wrote:

//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

You're saying some outliers will not be flagged?

[1] How does the above statement evaluate when (min == low) and (max == hi)?

strcat(outliers,"none (using IQR * 1.5 rule)"); >>> }

//at least one outlier
if ( min < lo || max > hi) {

[2] How does the above statement evaluate when (min == low) and (max == hi)?

[3] Given the answers to questions 1 and 2, are there any values that
satisfy /both/ the "no outliers" and "at least one outlier" conditions?
Are there any values that satisfy /neither/ conditions?

[snip]

HTH
--
Lew Pitcher
"In Skills We Trust"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to DFS on Thu Jun 13 15:40:29 2024

DFS <[email protected]> writes:

On 6/13/2024 3:25 AM, Ike Naar wrote:

temp[strlen(temp)] = '\0';

This is unnecessary;
sprintf terminates the generated string with a null character.

Thanks.

Most programmers should consider sprintf to be deprecated and
should never used it. snprintf is safer and more capable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Scott Lurndal on Thu Jun 13 18:08:03 2024

[email protected] (Scott Lurndal) writes:

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

And C provides "%zu".

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Lew Pitcher on Thu Jun 13 13:05:43 2024

On 6/13/2024 11:49 AM, Lew Pitcher wrote:

On Thu, 13 Jun 2024 11:13:04 -0400, DFS wrote:

On 6/13/2024 3:25 AM, Ike Naar wrote:

On 2024-06-12, DFS <[email protected]> wrote:

//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

You're saying some outliers will not be flagged?

[1] How does the above statement evaluate when (min == low) and (max == hi)?

//at least one outlier
if ( min < lo || max > hi) {

[2] How does the above statement evaluate when (min == low) and (max == hi)?

[3] Given the answers to questions 1 and 2, are there any values that
satisfy /both/ the "no outliers" and "at least one outlier" conditions?
Are there any values that satisfy /neither/ conditions?

[snip]

HTH

It does help. The original code won't miss any outliers, but it also
won't notify you there were none in the exceedingly rare case that the
bounds of the dataset exactly match the bounds of the outlier rule.

No outliers test:
Orig : if (min > lo && max < hi)
Fixed: if (min >= lo && max <= hi)

At least one outlier test:
Orig: if (min < lo || max > hi) {
No fix necessary

Thanks Lew.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Scott Lurndal on Thu Jun 13 19:01:23 2024

On 13/06/2024 16:39, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 01:33, Keith Thompson wrote:

printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
by the format string. This:
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

What about the previous 3 decades?

What about the compilers that can't do that?

What about even the latest gcc 14.1 that won't diagnose it even with
-Wpedantic -Wextra?

What about when the format string is a variable?

What about the example given below?

It is definitely a language problem. Dealing with some of it with some compilers with some options isn't a solution, it's just a workaround.

Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

And here it just gets even uglier. You also get situations like this:

uint64_t i=0;
printf("%lld\n", i);

This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.

It can't tell you that you should be using one of those ludicrous macros.

I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say anything.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Jun 14 00:55:12 2024

Malcolm McLean <[email protected]> writes:

On 13/06/2024 19:01, bart wrote:

And here it just gets even uglier. You also get situations like this:
�� uint64_t i=0;
�� printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another language or do what you have to in C.

In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure. But it also limits allocation to slightly under INT_MAX. Which should be plenty for a Baby program, and if you want more, you always have big boy's malloc.

And if you need to change the size?

But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

For people taught to ignore size_t, care is also needed when calling
functions that take size_t arguments as the signed to unsigned
conversion can cause surprises when not flagged by the compiler. I
don't know if I am right, but I would bet that many of the "don't bother
with size_t" crowd are also in the "don't bother with all those warning
flags to the compiler" crowd.

and long is very
special purpose (it holds the 32 bit rgba values).

Isn't that rather wasteful when long is 64 bits?

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Keith Thompson on Fri Jun 14 02:18:45 2024

On 13/06/2024 23:58, Keith Thompson wrote:

bart <[email protected]> writes:

Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

By using a different language, which perhaps you should consider
discussing in a different newsgroup. We discuss C here.

That was my point about the 3 decades it took to do something about it.
In the end nothing really changed.

If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

The format string specified two things. One is to do with the type of an expression, which the compiler knows. After all that's how sometimes it
can tell you you've got it wrong.

And if it can do that, it could also put in the format for you.

Yes, the fact that incorrect printf format strings cause undefined
behavior, and that that's sometimes difficult to diagnose, is a
language problem. I don't recall anyone saying it isn't. But it's
really not that hard to deal with it as a programmer.

If you have ideas (other than abandoning C) for a flexible
type-safe printing function, by all means share them. What are your suggestions?

A few years ago I played with a "%?" format code in my 'bcc' compiler
and demonstrated it here. The ? gets replaced by some suitable format
code. This is done within the compiler, not the printf library.

For other display control, such as hex output, or to provide other info
such as width, that still needs to be provided as it is done now.

This would cover most of my points except variable format strings, which
you said were not worth worrying about.

Here is a demo:

--------------------------
#include <stdio.h>
#include <stdint.h>
#include <time.h>

int main(void) {
uint64_t a = 0xFFFFFFFF00000000;
float b = 1.46;
int c = -67;
char* d = "Hello";
int* e = &c;

for (int i=0; i<100000000; ++i);

clock_t f = clock();

printf("%=? %=? %=? %=? %=? %=?\n", a, b, c, d, e, f);
printf("%=? %=? %=? %=? %=? %=?\n", f, e, d, c, b, a);
}
--------------------------

This prints 6 variables of diverse types with a suitable default format.
Then it prints then in reverse order, without having to change those
format codes.

The '=' is an extra feature which displays the name of the argument.

The output from this was:

A=18446744069414584320 B=1.460000 C=-67 D=Hello E=000000000080FF08 F=219
F=219 E=000000000080FF08 D=Hello C=-67 B=1.460000 A=18446744069414584320

It's not quite as good as my language where it's just:

println =a, =b, =c, =d, =d, =f

but I think it was an interesting experiment. This required 50 lines of
code within my C compiler; a bit more for a full treatment.

Adding `print` as a new keyword so you can use `print

foo` is unlikely to be considered practical; I'd want a much more
general mechanism that's not limited to stdio files. Reasonable new
language features that enable type-safe printf-like functions could
be interesting. I'm not aware of any such proposals for C.

We just can't have size_t variables swilling around in prgrams for these >>>> reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

And here it just gets even uglier. You also get situations like this:

uint64_t i=0;
printf("%lld\n", i);

This compiles OK with gcc -Wall, on Windows64. But compile under
Linux64 and it complains the format should be %ld. Change it to %ld,
and it complains under Windows.

It can't tell you that you should be using one of those ludicrous macros.

And you know why, right? uint64_t is a typedef (an alias) for some
existing type, typically either unsigned long or unsigned long long.
If uint64_t is a typedef for unsigned long long, then i is of type
unsigned long long, and the format string is correct.

Sure, that's a language problem. It's unfortunate that code can be
either valid or a constraint violation depending on how the current implementation defines uint64_t. I just don't spend much time
complaining about it.

I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

C99 added <stdint.h>, defining fixed-width and other integer types using existing language features. Sure, there are some disadvantages in the
way it was done. The alternative, creating new language features, would likely have resulted in the proposal not being accepted until some time
after C99, if ever.

I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

The standard allows using an argument of an integer type with a format
of the corresponding type of the other signedness, as long as the value
is in the range of both. (I vaguely recall the standard's wording being
a bit vague on this point.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Jun 14 12:44:13 2024

Malcolm McLean <[email protected]> writes:

On 14/06/2024 00:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 19:01, bart wrote:

And here it just gets even uglier. You also get situations like this:
�� uint64_t i=0;
�� printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64 >>>> and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros. >>>> I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say >>>> anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure. >>> But it also limits allocation to slightly under INT_MAX. Which should be >>> plenty for a Baby program, and if you want more, you always have big boy's >>> malloc.

And if you need to change the size?

But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.

I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop commenting.

and long is very
special purpose (it holds the 32 bit rgba values).

Isn't that rather wasteful when long is 64 bits?

No, because we store images as unsigned char buffers. But it's convenient
to pass around coulor values in a single variable.

Right. So you don't always use long for "holding rgba values". Another
rule I didn't know.

However there is the worry that accessing rgba channels as bytes rather
than insisting that the buffer be aligned, and accessing as a 32-bit
value,

Which is why I thought you might be including images in the notion of
"holding rgba values".

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to Malcolm McLean on Fri Jun 14 16:32:58 2024

On 14/06/2024 15:30, Malcolm McLean wrote:

Yes, I really need to get that website together so that people cotton on
to what Baby X is, what it can and cannot do, and what is the point.

Is it a shell? A windowing toolkit? A filesystem? A resource compiler?

I have no idea.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri Jun 14 19:13:42 2024

On 14/06/2024 01:47, Keith Thompson wrote:

David Brown <[email protected]> writes:
[...]

Certainly "-O3" is rarely worth it unless you are also using a
"-march=" flag (such as "-fmarch=native") to tune for a particular
processor and enable stuff like vectorisation. Getting the fastest
code is more of an art than a science!

Typo: it's "-march=native".

Thanks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri Jun 14 19:08:13 2024

On 14/06/2024 00:58, Keith Thompson wrote:

bart <[email protected]> writes:

On 13/06/2024 16:39, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 01:33, Keith Thompson wrote:

If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

C23 also adds explicit width length modifiers. So instead of having to
guess if uint64_t is "%llu" or "%lu" on a particular platform, or using
the PRIu64 macro, you can now use "%w64u" for uint64_t (or
uint_least64_t if the exact width type does not exist). I think that's
about as neat as you could get, within the framework of printf.

Yes, the fact that incorrect printf format strings cause undefined
behavior, and that that's sometimes difficult to diagnose, is a
language problem. I don't recall anyone saying it isn't. But it's
really not that hard to deal with it as a programmer.

It is particularly easy if you have a decent compiler and know how to
enable the right warning flags!

If you have ideas (other than abandoning C) for a flexible
type-safe printing function, by all means share them. What are your suggestions? Adding `print` as a new keyword so you can use `print
foo` is unlikely to be considered practical; I'd want a much more
general mechanism that's not limited to stdio files. Reasonable new
language features that enable type-safe printf-like functions could
be interesting. I'm not aware of any such proposals for C.

It is possible to come a long way with variadic macros and _Generic.
You can at least end up being able to write something like :

int x = 123;
const char * s = "Hello, world!";
uint64_t u = 0x4242;

Print("X = ", x, " the string is ", s, " and u = 0x",
as_hex(u, 6), newline);

rather than:

printf("X = %i the string is %s and u = 0x%06lx\n");

Which you think is better is a matter of opinion.

I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

I too would like such a typedef. Using it for uint64_t would cause
problems for /existing/ uses of _Generic, but would make future uses better.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Fri Jun 14 17:36:19 2024

David Brown <[email protected]> writes:

On 13/06/2024 16:38, DFS wrote:

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I'd likely turn to R instead of Python for that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to DFS on Fri Jun 14 19:18:35 2024

On 13/06/2024 16:38, DFS wrote:

On 6/13/2024 9:21 AM, David Brown wrote:

On 13/06/2024 00:34, DFS wrote:

On 6/12/2024 6:22 PM, Keith Thompson wrote:

Janis Papanagnou <[email protected]> writes:

On 12.06.2024 22:47, DFS wrote:

[...]

before: char outliers[100];
after : char outliers[100] = "";

[...]

Seriously; why do you expect [in C] a declaration to initialize that >>>>> stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not >>>>> all 100 elements set to '\0'? (Someone else might want to access the >>>>> element 'answer[99]'.) And should we pay for initializing 1000000000 >>>>> characters in case one declares an appropriate huge array?

This:
     char outliers[100] = "";
initializes all 100 elements to zero. So does this:
     char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to
zero.

Yes. It's good to point that out, since people might assume that
using a string literal here only initialises the bit covered by that
string literal.

(In C23 you can also write "char outliers[100] = {};" to get all zeros.)

If you want to set an array's 0th element to 0 and not waste time
initializing the rest, you can assign it separately:
     char outliers[100];
     outliers[0] = '\0';
or
     char outliers[100];
     strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

A good compiler will generate the same code for both cases - strcpy()
is often inlined for such uses.

Thanks. I'll have to remember these things. I like to use char arrays. >>>
The problem is I don't use C very often, so I don't develop muscle
memory.

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I like C for it's blazing speed. Very addicting. And it's much more challenging/frustrating than python.

With small data sets, Python has blazing speed - /every/ language has
blazing speed. And for large data sets, use numpy on Python and you
/still/ have blazing speeds - a lot faster than anything you would write
in C (because numpy's underlying code is written in C by people who are
much better at writing fast numeric code than you or I).

The only reason to use C for something like is is for the challenge and
fun, which is fair enough.

I coded a subset (8 stat measures) of this C program 3.5 years ago, and recently decided to finish duplicating all 23 stats shown at:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

Working on the outliers code, I decided to add an option to generate
data with consecutive numbers. That's when I ran $./dfs 50 -c and
noticed every value above 40 was considered an outlier. And this didn't change over a bunch of code edits/file saves/compiles.

Understanding how an uninitialized variable caused that persistent issue
is beyond my pay grade.

Understanding that you should not read from a variable that has never
been given a value is well within the pay grade of every programmer.
And it's something that every C programmer should understand. (And now
you understand it too!)

That's when I whined to clc. Before I even posted, though, I spotted
the uninitialized var (outliers). Later I spotted another one (mode).

One led to 'undefined behavior', the other to 'stack smashing'. Both
only occurred when using consecutive numbers.

But with y'all's help I believe I found and fixed ALL issues. I can
dream anyway.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Malcolm McLean on Fri Jun 14 19:31:29 2024

On 14/06/2024 19:06, Malcolm McLean wrote:

Baby X FS - the filing system - code that allows you to create a
virtual drive on your computer and access files from it using special fopen(), fclose() functions, but standard library functions like
fprintf() or fgetc() for the other operations

I think people don't get is why they should use this filing system, when
they already have a perfectly good one within their OS on which fopen()
etc already work.

When you do fclose() after writing a file, will get it written to some persistent media?

Because either it says in memory (dangerous if your machine crashes, or
someone just turns it off), or it gets written to the same SSD/SD/HDD
media that the real OS uses. In which case, what is the point?

I gather this is not any of kind of OS with its own drivers for the
peripherals on the machine, that takes over the real OS, or runs as some
kind of virtual machine.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Jun 14 22:29:00 2024

Malcolm McLean <[email protected]> writes:

On 14/06/2024 12:44, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 14/06/2024 00:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 19:01, bart wrote:

And here it just gets even uglier. You also get situations like this: >>>>>> �� uint64_t i=0;
�� printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64 >>>>>> and it complains the format should be %ld. Change it to %ld, and it >>>>>> complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for >>>>>> signed. That may or may not be deliberate, but the compiler didn't say >>>>>> anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another >>>> language or do what you have to in C.

In Baby X I provide a function called bbx_malloc(). It's is guaranteed >>>>> never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be >>>>> plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions), >>>> strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.
I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on to what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.

...

However if you need to pass a colour value to a fuction, you normall pass a BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you query the channels using the macros in bbx_color.h

#ifndef bbx_color_h
#define bbx_color_h

typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) << 8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

#endif

The last macro is to make it easier to interface with Xlib, and has the prefix BBX_ (upper case) indicating that it is for internal use by the bbx library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Fri Jun 14 22:32:12 2024

"Chris M. Thomasson" <[email protected]> writes:

Fwiw, I remember doing a channel based hit map that stored an image using RGBA but used floats. Each pixel would have a hit:

struct hit
{
float m_color[4];
};

It would take all of the hits and depending on what was going on during iteration it would increment parts of hit::m_color[4].

Not in C you didn't!

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to David Brown on Fri Jun 14 19:05:49 2024

On 6/14/2024 1:18 PM, David Brown wrote:

On 13/06/2024 16:38, DFS wrote:

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I like C for it's blazing speed. Very addicting. And it's much more
challenging/frustrating than python.

With small data sets, Python has blazing speed - /every/ language has
blazing speed. And for large data sets, use numpy on Python and you
/still/ have blazing speeds - a lot faster than anything you would write
in C (because numpy's underlying code is written in C by people who are
much better at writing fast numeric code than you or I).

The only reason to use C for something like is is for the challenge and
fun, which is fair enough.

It was fun, especially when I got every stat to match the website exactly.

I just now ported that C stats program to python. The original C took
me ~2.5 days to write and test.

The port to python then took about 2 hours.

It mainly consisted of replacing printf with print, removing brackets
{}, changing vars max and min to dmax and dmin, dropping the \n from
printf's, replacing fabs() with abs(), etc.

Line count dropped about 20%.

During conversion, I got a Python error I don't remember seeing in the past:

"TypeError: list indices must be integers or slices, not float"

because division returns a float, and some of the array addressing was
like this: nums[i/2].

My initial fix was this clunk (convert to int()):

# median and quartiles
# quartiles divide sorted dataset into four sections
# Q1 = median of values less than Q2
# Q2 = median of the data set
# Q3 = median of values greater than Q2
if N % 2 == 0:
Q2 = median = (nums[int((N/2)-1)] + nums[int(N/2)]) / 2.0
i = int(N/2)
if i % 2 == 0:
Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0
Q3 = (nums[int(i + ((i-1)/2))] + nums[int(i+(i/2))]) / 2.0
else:
Q1 = nums[int((i-1)/2)]
Q3 = nums[int(i + ((i-1)/2))]

if N % 2 != 0:
Q2 = median = nums[int((N-1)/2)]
i = int((N-1)/2)
if i % 2 == 0:
Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0
Q3 = (nums[int(i + (i/2))] + nums[int(i + (i/2) + 1)]) / 2.0
else:
Q1 = nums[int((i-1)/2)]
Q3 = nums[int(i + ((i+1)/2))]

And then with some substitution:

if N % 2 == 0:
i = int(N/2)
Q2 = median = (nums[i - 1] + nums[i]) / 2.0
x = int(i/2)
y = int((i-1)/2)
if i % 2 == 0:
Q1 = (nums[x - 1] + nums[x]) / 2.0
Q3 = (nums[i + y] + nums[i + x]) / 2.0
else:
Q1 = nums[y]
Q3 = nums[i + y]

if N % 2 != 0:
i = int((N-1)/2)
Q2 = median = nums[i]
x = int(i/2)
y = int((i-1)/2)
z = int((i+1)/2)
if i % 2 == 0:
Q1 = (nums[x - 1] + nums[x]) / 2.0
Q3 = (nums[i + x] + nums[i + x + 1]) / 2.0
else:
Q1 = nums[y]
Q3 = nums[i + z]

How would you do it?

If you have an easy to apply formula for computing the quartiles, let's
hear it!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Sat Jun 15 00:14:22 2024

Malcolm McLean <[email protected]> writes:

On 14/06/2024 22:29, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 14/06/2024 12:44, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 14/06/2024 00:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 19:01, bart wrote:

And here it just gets even uglier. You also get situations like this: >>>>>>>> �� uint64_t i=0;
�� printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it >>>>>>>> complains under Windows.
It can't tell you that you should be using one of those
ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for >>>>>>>> signed. That may or may not be deliberate, but the compiler didn't say >>>>>>>> anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another >>>>>> language or do what you have to in C.

In Baby X I provide a function called bbx_malloc(). It's is guaranteed >>>>>>> never to return null. Currently it just calls exit() on
allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always
have big boy's
malloc.

And if you need to change the size?

But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions), >>>>>> strspn (and friend), strftime, fread, fwrite. etc. etc. all return >>>>>> size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need >>>> for size_t.
I confess that I am all at sea about what you are doing. In essence, I >>>> don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on to >>> what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.
...

However if you need to pass a colour value to a fuction, you normall pass a >>> BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you >>> query the channels using the macros in bbx_color.h

#ifndef bbx_color_h
#define bbx_color_h

typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

Why use long?

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) << >>> 8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

No, it's been miswritten. Which is what I mean about C's integer types
being a source of bugs. That code does not look buggy, but it is.

I have no idea what this means. You start with "no" but I can't work
out what you think is wrong about what I said. And what does "has been miswritten" mean? Both the tense and the use of "miswritten" are
confusing to me. And, to me, the code does look "buggy".

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

#endif

The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by the bbx >>> library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

They're function-like macros. Iterating over an rgba buffer is very processor-intensive, and so we do haave to compromise sfatety for speed
here.

I am not suggesting otherwise.

All function-like symbols bbx_ are provided by Baby X for users, all
symbols BBX_ have that prefix to reduce the chance of collisions with other code.

Clearly. I'm not sure why you have reiterated this. I did not intend
to change your mind, just to point out that it's the reverse of the
common convention.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Keith Thompson on Fri Jun 14 23:49:37 2024

On 6/14/2024 9:39 PM, Keith Thompson wrote:

DFS <[email protected]> writes:
[...]

During conversion, I got a Python error I don't remember seeing in the past: >>
"TypeError: list indices must be integers or slices, not float"

because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]

C's "/" operator yields a result with the type of the operands (after promotion to a common type).

Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the array addressing results in an integer.

After casting i to an int before any array addressing, // works.

Thanks

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Keith Thompson on Sat Jun 15 00:45:24 2024

On 6/14/2024 11:56 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

On 6/14/2024 9:39 PM, Keith Thompson wrote:

DFS <[email protected]> writes:
[...]

During conversion, I got a Python error I don't remember seeing in the past:

"TypeError: list indices must be integers or slices, not float"

because division returns a float, and some of the array addressing was >>>> like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands
(after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.

After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

In the recent past you were involved in discussions on perl, Fortran and
awk, among other off-topics.

Rules for thee but not for me?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to DFS on Sat Jun 15 07:03:16 2024

On 15.06.2024 06:45, DFS wrote:

On 6/14/2024 11:56 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long as i always has an int value (note Python's dynamic typing).
If i is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

Given what Keith suggested, and assuming N is an integer, wouldn't it
be more sensible to use the int division operator '//' and just write
i = N // 2 ? I mean, why do a float division on integer operands and
then again coerce the result to int again?

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to DFS on Sat Jun 15 01:05:16 2024

On 6/15/24 00:45, DFS wrote:

On 6/14/2024 11:56 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

On 6/14/2024 9:39 PM, Keith Thompson wrote:

...

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

Alternatively, i = N//2

But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

Keep in mind that he's just one Python developer. With all due respect
to David, you're likely to get better answers to your Python questions
by going to a Python forum filled with Python developers.
It's not about "following the rules" - rules are meaningless when
enforcement is impossible, as it is in an unmoderated newsgroup like
this one. It's about getting the best possible answer to your questions.
If you prefer get lower quality answers to your Python questions,
continue asking them in forums where they are off-topic - but why would
you prefer that?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to DFS on Sat Jun 15 09:37:06 2024

On 15/06/2024 05:45, DFS wrote:

On 6/14/2024 11:56 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

On 6/14/2024 9:39 PM, Keith Thompson wrote:

DFS <[email protected]> writes:
[...]

During conversion, I got a Python error I don't remember seeing in
the past:

"TypeError: list indices must be integers or slices, not float"

because division returns a float, and some of the array addressing was >>>>> like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands
(after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.

After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

Yeah do that. Set up a private corner of comp.lang.c where David Brown
has a sideline answering questions about Python from only one poster.

Nobody else is allowed to answer.

Sounds ridiculous, yes?

In the recent past you were involved in discussions on perl, Fortran and
awk, among other off-topics.

Rules for thee but not for me?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DFS@21:1/5 to Janis Papanagnou on Sat Jun 15 07:39:40 2024

On 6/15/2024 1:03 AM, Janis Papanagnou wrote:

On 15.06.2024 06:45, DFS wrote:

On 6/14/2024 11:56 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long as i always has an int value (note Python's dynamic typing).
If i is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

Given what Keith suggested, and assuming N is an integer, wouldn't it
be more sensible to use the int division operator '//' and just write
i = N // 2 ? I mean, why do a float division on integer operands and
then again coerce the result to int again?

Python bytecode
$ python3 -m dis file.py

i = N//2
1068 LOAD_NAME 12 (N)
1070 LOAD_CONST 10 (2)
1072 BINARY_FLOOR_DIVIDE
1074 STORE_NAME 10 (i)

i = int(N/2)
1068 LOAD_NAME 11 (int)
1070 LOAD_NAME 12 (N)
1072 LOAD_CONST 10 (2)
1074 BINARY_TRUE_DIVIDE
1076 CALL_FUNCTION 1
1078 STORE_NAME 10 (i)

Fewer ops is better, so I'll go with your suggestion. Good catch.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Sat Jun 15 20:57:49 2024

On 15/06/2024 00:35, Malcolm McLean wrote:

On 14/06/2024 22:29, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 14/06/2024 12:44, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 14/06/2024 00:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 19:01, bart wrote:

And here it just gets even uglier. You also get situations like >>>>>>>> this:
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under >>>>>>>> Linux64
and it complains the format should be %ld. Change it to %ld, and it >>>>>>>> complains under Windows.
It can't tell you that you should be using one of those
ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls >>>>>>>> for
signed. That may or may not be deliberate, but the compiler
didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use >>>>>> another
language or do what you have to in C.

In Baby X I provide a function called bbx_malloc(). It's is
guaranteed
never to return null. Currently it just calls exit() on
allocation failure.
But it also limits allocation to slightly under INT_MAX. Which
should be
plenty for a Baby program, and if you want more, you always have >>>>>>> big boy's
malloc.

And if you need to change the size?

But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs...
functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return >>>>>> size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need >>>> for size_t.
I confess that I am all at sea about what you are doing. In essence, I >>>> don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton
on to
what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.

...

However if you need to pass a colour value to a fuction, you normall
pass a
BBX_RGBA value, which is typedefed to unsigned long, and is opaque,
and you
query the channels using the macros in bbx_color.h

#ifndef bbx_color_h
#define bbx_color_h

typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) |
((b) <<
8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

No, it's been miswritten. Which is what I mean about C's integer types
being a source of bugs. That code does not look buggy, but it is.

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

#endif

The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by
the bbx
library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

They're function-like macros. Iterating over an rgba buffer is very processor-intensive, and so we do haave to compromise sfatety for speed
here. All function-like symbols bbx_ are provided by Baby X for users,
all symbols BBX_ have that prefix to reduce the chance of collisions
with other code.

In this little exchange, there have been several points where your code
is unclear, inefficient, non-portable or downright buggy, purely due to
your insistence in using an outdated version of C.

If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer, write:

typedef uint32_t BBX_RGBA;

If you want bbx_rgba() to be a function that is typesafe, correct, and efficient (for any decent compiler), write :

static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
uint32_t b, uint32_t a)
{
return (r << 24) | (g << 16) | (b << 8) | a;
}

If you want your colour types to be "opaque", as you claimed, make it a
struct with inline accessor functions.

Use static inline functions instead of function-like macros and you
don't need the extra parenthesis round things (and you don't need to
justify to readers why they are not there). You can use small letter
names without running contrary to common conventions.

Your insistence on hobbling your choice of language shows through in the
poor quality of the code - or at least, the missed opportunities to make
the code better and safer for both you and your users.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to David Brown on Sat Jun 15 20:27:25 2024

On 15/06/2024 19:57, David Brown wrote:

If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer, write:

    typedef uint32_t BBX_RGBA;

If you want bbx_rgba() to be a function that is typesafe, correct, and efficient (for any decent compiler), write :

    static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
            uint32_t b, uint32_t a)
    {
        return (r << 24) | (g << 16) | (b << 8) | a;
    }

Shouldn't that be ... ?

static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
uint8_t b, uint8_t a)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Scott Lurndal on Sat Jun 15 22:15:09 2024

On 14/06/2024 19:36, Scott Lurndal wrote:

David Brown <[email protected]> writes:

On 13/06/2024 16:38, DFS wrote:

What programming language do you usually use? And why are you writing >>>> in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I'd likely turn to R instead of Python for that.

The only thing I know about R is that it would be a good choice for
statistics code if I knew R. Since I don't know anything more about R,
I'd go for Python :-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sat Jun 15 22:13:24 2024

On 14/06/2024 21:34, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 14/06/2024 00:58, Keith Thompson wrote:

bart <[email protected]> writes:

On 13/06/2024 16:39, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 13/06/2024 01:33, Keith Thompson wrote:

If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

C23 also adds explicit width length modifiers. So instead of having
to guess if uint64_t is "%llu" or "%lu" on a particular platform, or
using the PRIu64 macro, you can now use "%w64u" for uint64_t (or
uint_least64_t if the exact width type does not exist). I think
that's about as neat as you could get, within the framework of printf.

Note that the new "%wN" modifier applies only to [u]intN_t and [u]int_leastN_t types, not to all integer types with a width of N bits.

Yes, but by the definition of these types, if you uintN_t exists then uint_leastN_t is the same type. (This is a new detail in C23.)

The standard doesn't guarantee that integer types with the same representation are interchangeable, so for example printf("%d", 0L) and printf("%ld", 0) both have undefined behavior. An implementation would probably have to go out of its way to make either of those do anything
other than printing "0", but the behavior is still undefined (i.e., the standard doesn't guarantee it will work).

True. For C17, uint32_t and uint_least32_t could be different and
incompatible types. It's highly unlikely, but possible. That was fixed
in C23. (7.22.1.1p3)

That's still the case in C23, even for the %wN modifiers. For a typical implementation with 32-bit integer types, uint32_t and uint_least32_t
will be the same type (C17 doesn't require that), and "%w32u" will work
with that type. It's not guaranteed to work with any other 32-bit
unsigned type. For an implementation that doesn't have any 32-bit
integer type, uint32_t won't exist, uint_least32_t will be, say, 64
bits, and "%w32u" will work with *that* type.

Yes, that is correct.

It can be surprising for some people to hear that types with identical
size and characteristics can still be incompatible. But at least with
C23, we don't have to worry about that for the uintN_t and uint_leastN_t
types. (The same applies to the signed versions.)

If you want to use the bit width length modifiers in C23 printf, you
might still have to cast your "int" or "long" data to an appropriate
intN_t or int_leastN_t.

That covers the exact-width and "least" types. The "%wfN" modifiers
cover the "fast" types.

So if you want to use C23's new "%wN" modifiers, you have to use the
types defined in <stdint.h> if you want to avoid undefined behavior.

Yes. But if you want particular sizes for your types, that's a good
idea anyway.

On the other hand, though `int n = 42; printf("%w32d\n", n);` has
undefined behavior, it's very very likely to work if int is 32 bits.
(`gcc -Wformat` warns about using "%ld" with a long long argument
even when long and long long have the same size, but not about using
"%w32d" with a 32-bit int argument.)

The new modifiers are supported in glibc 2.39, which is included in
Ubuntu 24.04. They're not supported in newlib (used by Cygwin) or in
MS Visual Studio 2022.

[...]

I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

I too would like such a typedef. Using it for uint64_t would cause
problems for /existing/ uses of _Generic, but would make future uses
better.

Currently, there are (in the absence of extended integer types) only a
finite number of incompatible integer types. This makes it possible to
write a _Generic expression that accepts an operand of any integer type, which can be useful if you have an integer typedef and don't know the underlying type. This new kind of typedef would allow programmers to introduce an unlimited number of new incompatible integer types.

Yes. But it would also allow you to make a "strong typedef" for a
particular use and have a _Generic that distinguishes it. I believe I
would find that more useful than the disadvantage you describe.
(Perhaps it would be even better if it were possible to extend
_Generic's, rather than cover all the types in one go.)

I haven't seen a lot of code that does that kind of thing, and none
that I didn't write myself.

Perhaps if this is introduced, there should be a way to determine the underlying type. C23 introduces typeof and typeof_unqual; perhaps we
could have typeof_underlying. It could also apply to enum types.

Interesting idea.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to DFS on Sat Jun 15 22:22:09 2024

On 15/06/2024 06:45, DFS wrote:

On 6/14/2024 11:56 PM, Keith Thompson wrote:

DFS <[email protected]> writes:

On 6/14/2024 9:39 PM, Keith Thompson wrote:

DFS <[email protected]> writes:
[...]

During conversion, I got a Python error I don't remember seeing in
the past:

"TypeError: list indices must be integers or slices, not float"

because division returns a float, and some of the array addressing was >>>>> like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands
(after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.

After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

I consider myself more of a C developer than a Python developer, but I
use Python regularly. I would say that my knowledge of the C language
and standard, while not as deep as some others here, covers a far higher proportion of the language than my knowledge of Python covers of Python.
But I think you can make good use of Python while knowing a smaller
fraction of the language and library than for C.

In the recent past you were involved in discussions on perl, Fortran and
awk, among other off-topics.

Rules for thee but not for me?

If occasional questions or discussions about other languages pop up
here, people will often answer them. But for more in-depth discussions
or questions, this is not the newsgroup - comp.lang.python is the place
for Python questions. (You'll also probably get better answers there
than I can give.)

The rules are for everyone, but they are a bit fuzzy. (And different
posters have different levels of fuzziness.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Richard Harnden on Sat Jun 15 23:13:01 2024

Richard Harnden <[email protected]d> writes:

On 15/06/2024 19:57, David Brown wrote:

If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer,
write:
��typedef uint32_t BBX_RGBA;
If you want bbx_rgba() to be a function that is typesafe, correct, and
efficient (for any decent compiler), write :
��static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
�� uint32_t b, uint32_t a)
��{
�� return (r << 24) | (g << 16) | (b << 8) | a;
��}

Shouldn't that be ... ?

static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
uint8_t b, uint8_t a)

Maybe, but the function then needs more care as uint8_t will promote to
int and then r << 24 can be undefined. One needs

((BBX_RGBA)r << 24) | (g << 16) | (b << 8) | a

(assuming that int is never going to be 16 bits or the same issue comes
up with the g << 16 shift). Given this assumption, I'd just check that unsigned int is at least 32 bits are use that for BBX_RGBA.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Richard Harnden on Sun Jun 16 12:53:38 2024

On 15/06/2024 21:27, Richard Harnden wrote:

On 15/06/2024 19:57, David Brown wrote:

If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer,
write:

     typedef uint32_t BBX_RGBA;

If you want bbx_rgba() to be a function that is typesafe, correct, and
efficient (for any decent compiler), write :

     static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
             uint32_t b, uint32_t a)
     {
         return (r << 24) | (g << 16) | (b << 8) | a;
     }

Shouldn't that be ... ?

static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
        uint8_t b, uint8_t a)

As Ben says, that will not work on its own - "r" would get promoted to
signed int before the shift, and we are back to undefined behaviour.

I think there is plenty of scope for improvement in a variety of ways, depending on what the author is looking for. For example, uint8_t might
not exist on all platforms (indeed there are current processors that
don't support it, not just dinosaur devices). But any system that
supports a general-purpose gui, such as Windows or *nix systems, will
have these types and will also have a 32-bit int. So the code author
can balance portability with convenient assumptions.

There are also balances to be found between run-time checking and
efficiency, and how to handle bad data. If the function can assume that
no one calls it with values outside 0..255, or that it doesn't matter
what happens if such values are used, then you don't need any checks.
As it stands, with uint32_t parameters, out-of-range values will lead to
fully defined but wrong results. Switching to "uint8_t" types would
give a different fully defined but wrong result. Maybe the function
should use saturation, or run-time checks and error messages - that will
depend on where it is in the API, what the code author wants, and what
users expect.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Tue Jun 18 17:23:09 2024

Keith Thompson <[email protected]> writes:

(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

It seems unnecessary given that the upcoming C standard
is choosing to mandate two's complement for all signed
integer types.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Sat Jun 22 09:28:14 2024

Keith Thompson <[email protected]> writes:

Tim Rentsch <[email protected]> writes:

Keith Thompson <[email protected]> writes:

(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

It seems unnecessary given that the upcoming C standard
is choosing to mandate two's complement for all signed
integer types.

It's less necessary, but I'd still like to see it.

These days, strings very commonly hold UTF-8 data. The fact that bytes
whose values exceed 127 are negative is conceptually awkward, even if everything happens to work. It rarely if ever makes sense to treat a character value as negative.

The combination of mandating two's complement and using a compiler
option like -funsigned-char (supported by both gcc and clang)
should be enough to do what you want.

(And of course signed char still exists,
or int8_t if you prefer 8 bits vs. CHAR_BIT bits.)

It makes me laugh when people use int8_t instead of signed char.
If CHAR_BIT isn't 8 then there won't be any int8_t. And of
course we can always throw in a static assertion if it is felt
necessary to protect against implementations that don't have
8-bit chars. (A static assertion also can verify that two's
complement is being used for signed char.)

A drawback is that it could break existing (non-portable) code that
assumes plain char is signed.

Exactly! No reason to break the whole world when you can get
what you want just by using a compiler option.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	58:07:42
Calls:	12,446
Calls today:	1
Files:	15,192
Messages:	6,537,395

"undefined behavior"?

Who's Online

Recent Visitors

System Info