• "undefined behavior"?

    From DFS@21:1/5 to All on Wed Jun 12 16:47:23 2024
    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one
    anomaly: when using a dataset of consecutive numbers 1 to N, all values
    40 are flagged as outliers. Up to 40, no problem. Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using
    the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away. Reset it to before and problem came back.

    Makes no sense. What could cause the program to go FUBAR at data point
    41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry Schwarz@21:1/5 to DFS on Wed Jun 12 14:30:26 2024
    On Wed, 12 Jun 2024 16:47:23 -0400, DFS <[email protected]> wrote:

    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one >anomaly: when using a dataset of consecutive numbers 1 to N, all values
    40 are flagged as outliers. Up to 40, no problem. Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using
    the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away. Reset it to before and problem came back.

    Makes no sense. What could cause the program to go FUBAR at data point
    41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?

    Makes perfect sense. The first rule of undefined behavior is
    "Whatever happens is exactly correct." You are not entitled to any expectations and none of the behavior (or perhaps all of the behavior)
    can be called unexpected.

    Since we cannot see your code, I will guess that you use a non-zero
    value in outliers[i] to indicate that the corresponding value has been identified as an outlier. Since you did not initialize the array
    outliers, you have no idea what indeterminate value any element of the
    array contains when your program begins execution. Apparently some of
    them are non-zero. The fact that the first 40 are zero and the
    remaining non-zero is merely an artifact of how your system builds
    this particular program with that particular set of compile and link
    options. Change anything and you could see completely different
    behavior, or not.

    I don't use gcc but, in debug mode, some compilers will put
    recognizable "garbage values" in uninitialized variables so you can
    spot the condition more easily.

    In any case, the C language does not prevent you from shooting
    yourself in the foot if you choose to. Evaluating an indeterminate
    value is one fairly common way to do this.

    --
    Remove del for email

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to DFS on Wed Jun 12 23:38:45 2024
    On 12/06/2024 22:47, DFS wrote:
    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values
    40 are flagged as outliers.  Up to 40, no problem.  Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using
    the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away.  Reset it to before and problem came back.

    Makes no sense.  What could cause the program to go FUBAR at data point
    41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?


    It is /really/ difficult to know exactly what your problem is without
    seeing your C code! There may be other problems that you haven't seen yet.

    Non-static local variables without initialisers have "indeterminate"
    value if there is no initialiser. Trying to use these "indeterminate"
    values is undefined behaviour - you have absolutely no control over what
    might happen. Any particular behaviour you see is done to luck from the
    rest of the code and what happened to be in memory at the time.

    There is no automatic initialisation of non-static local variables,
    because that would often be inefficient. The best way to avoid errors
    like yours, IMHO, is not to declare such variables until you have data
    to put in them - thus you always have a sensible initialiser of real
    data. Occasionally that is not practical, but it works in most cases.

    For a data array, zero initialisation is common. Typically you do this
    with :

    int xs[100] = { 0 };

    That puts the explicit 0 in the first element of xs, and then the rest
    of the array is cleared with zeros.

    I recommend never using "char" as a type unless you really mean a
    character, limited to 7-bit ASCII. So if your "outliers" array really
    is an array of such characters, "char" is fine. If it is intended to be numbers and for some reason you specifically want 8-bit values, use
    "uint8_t" or "int8_t", and initialise with { 0 }.

    A major lesson here is to learn how to use your tools. C is not a
    forgiving language. Make use of all the help your tools can give you -
    enable warnings here. "gcc -Wall" enables a range of common warnings
    with few false positives in normal well-written code, including ones
    that check for attempts to read uninitialised data. "-Wextra" enables a
    slew of extra warnings. Some of these will annoy people and trigger on
    code they find reasonable, while most are good choices for a lot of code
    - but personal preference varies significantly. And remember to enable optimisation, since it makes the static checking more powerful.

    If you /really/ want gcc to zero out such local data automatically, use "-ftrivial-auto-var-init=zero". But it is much better to use warnings
    and write correct code - options like that one are an addition to
    well-checked code for paranoid software in security-critical contexts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to DFS on Wed Jun 12 23:38:55 2024
    On 12.06.2024 22:47, DFS wrote:
    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php


    My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values
    40 are flagged as outliers. Up to 40, no problem. Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using
    the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away. Reset it to before and problem came back.

    Makes no sense. What could cause the program to go FUBAR at data point
    41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?

    Yeah, I had a similar problem like you; I had a declaration

    char answer[100];

    and was surprised that it wasn't initialized with "42".

    Seriously; why do you expect [in C] a declaration to initialize that
    stack object? (There are other languages that do initializations as
    the language defines it, but C doesn't; it may help to learn before
    programming in any language?) And why do you think that "" would be
    an appropriate initialization (i.e. a single '\0' character) and not
    all 100 elements set to '\0'? (Someone else might want to access the
    element 'answer[99]'.) And should we pay for initializing 1000000000
    characters in case one declares an appropriate huge array?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Barry Schwarz on Wed Jun 12 17:53:35 2024
    On 6/12/2024 5:30 PM, Barry Schwarz wrote:
    On Wed, 12 Jun 2024 16:47:23 -0400, DFS <[email protected]> wrote:

    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one
    anomaly: when using a dataset of consecutive numbers 1 to N, all values
    40 are flagged as outliers. Up to 40, no problem. Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using
    the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away. Reset it to before and problem came back.

    Makes no sense. What could cause the program to go FUBAR at data point
    41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?

    Makes perfect sense. The first rule of undefined behavior is
    "Whatever happens is exactly correct." You are not entitled to any expectations and none of the behavior (or perhaps all of the behavior)
    can be called unexpected.

    I HATE bogus answers like this.

    Aren't you embarrassed to say things like that?



    Since we cannot see your code, I will guess that you use a non-zero
    value in outliers[i] to indicate that the corresponding value has been identified as an outlier.


    No.

    I compare the data point to the lower and upper bounds of a stat rule
    commonly called the "IQR Rule":

    lo = Q1 - (1.5 * IQR)
    hi = Q3 + (1.5 * IQR)

    If it falls outside the range of lo-hi I strcat the value to a char.

    The outlier routine starts line 170.

    If you change

    char outliers[200]="", temp[10]="";
    to
    char outliers[200], temp[10];

    you might see what happens when you run the program for consecutive values:

    $ ./prog 100 -c


    =========================================================================

    //this code is hereby released to the public domain

    #include <stdlib.h>
    #include <stdio.h>
    #include <math.h>
    #include <string.h>
    #include <time.h>

    /*
    this program computes the descriptive statistics of a randomly
    generated set of N integers

    1.0 release Dec 2020
    2.0 release Jun 2024

    used the population skewness and Kurtosis formulas from:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
    also test the results of this code against that site

    compile: gcc -Wall prog.c -o prog -lm
    usage : ./prog N -option (where N is 2 or higher, and option is -r or
    -c or -o)
    -r generates N random numbers
    -c generates consecutive numbers 1 to N
    -o generates random numbers with outliers
    */


    //random ints
    int randNbr(int low, int high) {
    return (low + rand() / (RAND_MAX / (high - low + 1) + 1));
    }

    //comparator function used with qsort
    int compareint (const void * a, const void * b)
    {
    if (*(int*)a > *(int*)b) return 1;
    else if (*(int*)a < *(int*)b) return -1;
    else return 0;
    }


    int main(int argc, char *argv[])
    {
    if(argc < 3) {
    printf("Missing argument:\n");
    printf(" * enter a number greater than 2\n");
    printf(" * enter an option -r -c or -o\n");
    exit(0);
    }


    //vars
    int i=0, lastmode=0;
    int N = atoi(argv[1]);
    int nums[N];
    //int *nums = malloc(N * sizeof(int));

    double sumN=0.0, median=0.0, Q1=0.0, Q2=0.0, Q3=0.0, IQR=0.0;
    double stddev = 0.0, kurtosis = 0.0;
    double sqrdiffmean = 0.0, cubediffmean = 0.0, quaddiffmean = 0.0;
    double meanabsdev = 0.0, rootmeansqr = 0.0;
    char mode[100], tmp[12];

    //generate random dataset
    if(strcmp(argv[2],"-r") == 0) {
    srand(time(NULL));
    for(i=0;i<N;i++) { nums[i] = randNbr(1,N*3); }

    printf("%d Randoms:\n", N);
    printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    qsort(nums,N,sizeof(int),compareint);
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    }

    //generate random dataset with outliers
    if(strcmp(argv[2],"-o") == 0) {
    srand(time(NULL));
    nums[0] = 1; nums[1] = 3;
    for(i=2;i<N-2;i++) { nums[i] = randNbr(100,N*30); }
    nums[N-2] = 1000; nums[N-1] = 2000;

    printf("%d Randoms with outliers:\n", N);
    printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    qsort(nums,N,sizeof(int),compareint);
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    }


    //generate consecutive numbers 1 to N
    if(strcmp(argv[2],"-c") == 0) {
    for(i=0;i<N;i++) { nums[i] = i + 1; }

    printf("%d Consecutive:\n", N);
    printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nWith commas : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    }

    //various
    for(i=0;i<N;i++) {sumN += nums[i];}
    double min = nums[0], max = nums[N-1];


    //calc descriptive stats
    double mean = sumN / (double)N;
    int ucnt = 1, umaxcnt=1;
    for(i = 0; i < N; i++)
    {
    sqrdiffmean += pow(nums[i] - mean, 2); // for variance and sum squares
    cubediffmean += pow(nums[i] - mean, 3); // for skewness
    quaddiffmean += pow(nums[i] - mean, 4); // for Kurtosis
    meanabsdev += fabs((nums[i] - mean)); // for mean absolute deviation
    rootmeansqr += nums[i] * nums[i]; // for root mean square

    //mode
    if(ucnt == umaxcnt && lastmode != nums[i])
    {
    sprintf(tmp,"%d ",nums[i]);
    strcat(mode,tmp);
    }

    if(nums[i]-nums[i+1]!=0) {ucnt=1;} else {ucnt++;}

    if(ucnt>umaxcnt)
    {
    umaxcnt=ucnt;
    memset(mode, '\0', sizeof(mode));
    sprintf(tmp, "%d ", nums[i]);
    strcat(mode, tmp);
    lastmode = nums[i];
    }
    }


    // median and quartiles
    // quartiles divide sorted dataset into four sections
    // Q1 = median of values less than Q2
    // Q2 = median of the data set
    // Q3 = median of values greater than Q2
    if(N % 2 == 0) {
    Q2 = median = (nums[(N/2)-1] + nums[N/2]) / 2.0;
    i = N/2;
    if(i % 2 == 0) {
    Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
    Q3 = (nums[i + ((i-1)/2)] + nums[i+(i/2)]) / 2.0;
    }
    if(i % 2 != 0) {
    Q1 = nums[(i-1)/2];
    Q3 = nums[i + ((i-1)/2)];
    }
    }

    if(N % 2 != 0) {
    Q2 = median = nums[(N-1)/2];
    i = (N-1)/2;
    if(i % 2 == 0) {
    Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
    Q3 = (nums[i + (i/2)] + nums[i + (i/2) + 1]) / 2.0;
    }
    if(i % 2 != 0) {
    Q1 = nums[(i-1)/2];
    Q3 = nums[i + ((i+1)/2)];
    }
    }



    // outliers: below Q1−1.5xIQR, or above Q3+1.5xIQR
    IQR = Q3 - Q1;
    char outliers[200]="", temp[10]="";
    if (N > 3) {

    //range for outliers
    double lo = Q1 - (1.5 * IQR);
    double hi = Q3 + (1.5 * IQR);

    //no outliers
    if ( min > lo && max < hi) {
    strcat(outliers,"none (using IQR * 1.5 rule)");
    }

    //at least one outlier
    if ( min < lo || max > hi) {
    for(i = 0; i < N; i++) {
    double val = (double)nums[i];
    if(val < lo || val > hi) {
    sprintf(temp,"%.0f ",val);
    temp[strlen(temp)] = '\0';
    strcat(outliers,temp);
    }
    }
    strcat(outliers," (using IQR * 1.5 rule)");
    }
    outliers[strlen(outliers)] = '\0';
    }


    stddev = sqrt(sqrdiffmean/N);
    kurtosis = quaddiffmean / (N * pow(sqrt(sqrdiffmean/N),4));


    //output
    printf("\n--------------------------------------------------------------\n");
    printf("Minimum = %.0f\n", min);
    printf("Maximum = %.0f\n", max);
    printf("Range = %.0f\n", max - min);
    printf("Size N = %d\n" , N);
    printf("Sum N = %.0f\n", sumN);
    printf("Mean μ = %.2f\n", mean);
    printf("Median = %.1f\n", median);
    if(umaxcnt > 1) {
    printf("Mode(s) = %s (%d occurrences ea)\n", mode,umaxcnt);}
    if(umaxcnt < 2) {
    printf("Mode(s) = na (no repeating values)\n");}
    printf("Std Dev σ = %.4f\n", stddev);
    printf("Variance σ^2 = %.4f\n", sqrdiffmean/N);
    printf("Mid Range = %.1f\n", (max + min)/2);
    printf("Quartiles");
    if(N > 3) {printf(" Q1 = %.1f\n", Q1);}
    if(N < 4) {printf(" Q1 = na\n");}
    printf(" Q2 = %.1f (median)\n", Q2);
    if(N > 3) {printf(" Q3 = %.1f\n", Q3);}
    if(N < 4) {printf(" Q3 = na\n");}
    printf("IQR = %.1f (interquartile range)\n", IQR);
    if(N > 3) {printf("Outliers = %s\n", outliers);}
    if(N < 4) {printf("Outliers = na\n");}
    printf("Sum Squares SS = %.2f\n", sqrdiffmean);
    printf("MAD = %.4f (mean absolute deviation)\n", meanabsdev / N);
    printf("Root Mean Sqr = %.4f\n", sqrt(rootmeansqr / N));
    printf("Std Error Mean = %.4f\n", stddev / sqrt(N));
    printf("Skewness γ1 = %.4f\n", cubediffmean / (N * pow(sqrt(sqrdiffmean/N),3)));
    printf("Kurtosis β2 = %.4f\n", kurtosis);
    printf("Kurtosis Excess α4 = %.4f\n", kurtosis - 3);
    printf("CV = %.6f (coefficient of variation\n", sqrt(sqrdiffmean/N) / mean);
    printf("RSD = %.4f%% (relative std deviation)\n", 100 * (sqrt(sqrdiffmean/N) / mean));
    printf("--------------------------------------------------------------\n");
    printf("Check results against\n");
    printf("https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php");
    printf("\n\n");

    //free(nums);
    return(0);
    }

    =========================================================================






    Since you did not initialize the array
    outliers, you have no idea what indeterminate value any element of the
    array contains when your program begins execution. Apparently some of
    them are non-zero. The fact that the first 40 are zero and the
    remaining non-zero is merely an artifact of how your system builds
    this particular program with that particular set of compile and link
    options. Change anything and you could see completely different
    behavior, or not.

    I don't use gcc but, in debug mode, some compilers will put
    recognizable "garbage values" in uninitialized variables so you can
    spot the condition more easily.

    In any case, the C language does not prevent you from shooting
    yourself in the foot if you choose to. Evaluating an indeterminate
    value is one fairly common way to do this.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Keith Thompson on Wed Jun 12 18:34:22 2024
    On 6/12/2024 6:22 PM, Keith Thompson wrote:
    Janis Papanagnou <[email protected]> writes:
    On 12.06.2024 22:47, DFS wrote:
    [...]
    before: char outliers[100];
    after : char outliers[100] = "";
    [...]
    Seriously; why do you expect [in C] a declaration to initialize that
    stack object? (There are other languages that do initializations as
    the language defines it, but C doesn't; it may help to learn before
    programming in any language?) And why do you think that "" would be
    an appropriate initialization (i.e. a single '\0' character) and not
    all 100 elements set to '\0'? (Someone else might want to access the
    element 'answer[99]'.) And should we pay for initializing 1000000000
    characters in case one declares an appropriate huge array?

    This:
    char outliers[100] = "";
    initializes all 100 elements to zero. So does this:
    char outliers[100] = { '\0' };
    Any elements or members not specified in an initializer are set to zero.

    If you want to set an array's 0th element to 0 and not waste time initializing the rest, you can assign it separately:
    char outliers[100];
    outliers[0] = '\0';
    or
    char outliers[100];
    strcpy(outliers, "");
    though the overhead of the function call is likely to outweigh the
    cost of initializing the array.

    Thanks. I'll have to remember these things. I like to use char arrays.

    The problem is I don't use C very often, so I don't develop muscle memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Keith Thompson on Wed Jun 12 19:07:29 2024
    On 6/12/2024 6:30 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    On 6/12/2024 5:30 PM, Barry Schwarz wrote:
    On Wed, 12 Jun 2024 16:47:23 -0400, DFS <[email protected]> wrote:

    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one
    anomaly: when using a dataset of consecutive numbers 1 to N, all values >>>>> 40 are flagged as outliers. Up to 40, no problem. Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using >>>> the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away. Reset it to before and problem came back.

    Makes no sense. What could cause the program to go FUBAR at data point >>>> 41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you? >>> Makes perfect sense. The first rule of undefined behavior is
    "Whatever happens is exactly correct." You are not entitled to any
    expectations and none of the behavior (or perhaps all of the behavior)
    can be called unexpected.

    I HATE bogus answers like this.

    Aren't you embarrassed to say things like that?

    He has nothing to be embarrassed about. What he wrote is correct.

    No it's not.

    "Whatever happens is exactly correct." is nonsense.

    "You are not entitled to any expectations" is nonsense.




    The C standard's definition of "undefined behavior" is "behavior, upon
    use of a nonportable or erroneous program construct or of erroneous
    data, for which this International Standard imposes no requirements".

    If you don't like the way C deals with undefined behavior, that's
    perfectly valid, and a lot of people are likely to agree with you.

    Thanks for feeling my pain!

    It's frustrating. By now I spent a half-hour dealing with it. gcc
    could've just filled the char[] variable with 0s by default. I bet that
    would save a LOT of people time and headaches.


    But I advise against lashing out at people who are correctly explaining
    what the C standard says.

    The C standard really says "Whatever happens is exactly correct."?



    DFS, since you've been posting in comp.lang.c for at least ten years,

    Time flies.

    How do you know I've posted here that long?



    I'm surprised you're having difficulties with this.

    I'm surprised at some of the wonkiness of gcc and C.

    * warns relentlessly when the printf specifier doesn't match the var
    type, but gives no warning when you use an int with memset (instead of
    the size_t specified in the function prototype).

    * a missing bracket } throws 50 nonsensical compiler errors.

    * warns of unused vars but not uninitialized ones

    * one uninitialized var makes your program do crazy things. Worse than
    crazy is it's identically crazy each time.

    ./prog 40 -c
    outliers: none

    ./prog 41 -c
    outliers: 41

    ./prog 42 -c
    outliers: 41 42

    ./prog 43 -c
    outliers: 41 42 43

    ./prog 44 -c
    outliers: 41 42 43 44

    etc. And none were outliers - not even close.

    At least if it showed nonsense data it would be easier to track down.
    Maybe.

    The thing is, none of those values (40+) were ever in that char[] prior
    to running the code for a set of 50 consecutive values.

    And I edited/compiled the code many times, but still got the identical
    error.

    I doubt my environment (gcc 11.4 on Windows Subsys for Linux on Ubuntu)
    has anything to do with it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to David Brown on Wed Jun 12 18:29:27 2024
    On 6/12/2024 5:38 PM, David Brown wrote:
    On 12/06/2024 22:47, DFS wrote:
    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one
    anomaly: when using a dataset of consecutive numbers 1 to N, all
    values  > 40 are flagged as outliers.  Up to 40, no problem.  Random
    numbers dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers
    (using the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away.  Reset it to before and problem came back.

    Makes no sense.  What could cause the program to go FUBAR at data
    point 41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?


    It is /really/ difficult to know exactly what your problem is without
    seeing your C code!  There may be other problems that you haven't seen yet.

    The outlier section starts on line 169 =====================================================================================

    //this code is hereby released to the public domain

    #include <stdlib.h>
    #include <stdio.h>
    #include <math.h>
    #include <string.h>
    #include <time.h>

    /*
    this program computes the descriptive statistics of a randomly
    generated set of N integers

    1.0 release Dec 2020
    2.0 release Jun 2024

    used the population skewness and Kurtosis formulas from:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
    also test the results of this code against that site

    compile: gcc -Wall prog.c -o prog -lm
    usage : ./prog N -option (where N is 2 or higher, and option is -r or
    -c or -o)
    -r generates N random numbers
    -c generates consecutive numbers 1 to N
    -o generates random numbers with outliers
    */


    //random ints
    int randNbr(int low, int high) {
    return (low + rand() / (RAND_MAX / (high - low + 1) + 1));
    }

    //comparator function used with qsort
    int compareint (const void * a, const void * b)
    {
    if (*(int*)a > *(int*)b) return 1;
    else if (*(int*)a < *(int*)b) return -1;
    else return 0;
    }


    int main(int argc, char *argv[])
    {
    if(argc < 3) {
    printf("Missing argument:\n");
    printf(" * enter a number greater than 2\n");
    printf(" * enter an option -r -c or -o\n");
    exit(0);
    }


    //vars
    int i=0, lastmode=0;
    int N = atoi(argv[1]);
    int nums[N];

    double sumN=0.0, median=0.0, Q1=0.0, Q2=0.0, Q3=0.0, IQR=0.0;
    double stddev = 0.0, kurtosis = 0.0;
    double sqrdiffmean = 0.0, cubediffmean = 0.0, quaddiffmean = 0.0;
    double meanabsdev = 0.0, rootmeansqr = 0.0;
    char mode[100], tmp[12];

    //generate random dataset
    if(strcmp(argv[2],"-r") == 0) {
    srand(time(NULL));
    for(i=0;i<N;i++) { nums[i] = randNbr(1,N*3); }

    printf("%d Randoms:\n", N);
    printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    qsort(nums,N,sizeof(int),compareint);
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    }

    //generate random dataset with outliers
    if(strcmp(argv[2],"-o") == 0) {
    srand(time(NULL));
    nums[0] = 1; nums[1] = 3;
    for(i=2;i<N-2;i++) { nums[i] = randNbr(100,N*30); }
    nums[N-2] = 1000; nums[N-1] = 2000;

    printf("%d Randoms with outliers:\n", N);
    printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    qsort(nums,N,sizeof(int),compareint);
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    }


    //generate consecutive numbers 1 to N
    if(strcmp(argv[2],"-c") == 0) {
    for(i=0;i<N;i++) { nums[i] = i + 1; }

    printf("%d Consecutive:\n", N);
    printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
    printf("\nWith commas : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
    }

    //various
    for(i=0;i<N;i++) {sumN += nums[i];}
    double min = nums[0], max = nums[N-1];


    //calc descriptive stats
    double mean = sumN / (double)N;
    int ucnt = 1, umaxcnt=1;
    for(i = 0; i < N; i++)
    {
    sqrdiffmean += pow(nums[i] - mean, 2); // for variance and sum squares
    cubediffmean += pow(nums[i] - mean, 3); // for skewness
    quaddiffmean += pow(nums[i] - mean, 4); // for Kurtosis
    meanabsdev += fabs((nums[i] - mean)); // for mean absolute deviation
    rootmeansqr += nums[i] * nums[i]; // for root mean square

    //mode
    if(ucnt == umaxcnt && lastmode != nums[i])
    {
    sprintf(tmp,"%d ",nums[i]);
    strcat(mode,tmp);
    }

    if(nums[i]-nums[i+1]!=0) {ucnt=1;} else {ucnt++;}

    if(ucnt>umaxcnt)
    {
    umaxcnt=ucnt;
    memset(mode, '\0', sizeof(mode));
    sprintf(tmp, "%d ", nums[i]);
    strcat(mode, tmp);
    lastmode = nums[i];
    }
    }


    // median and quartiles
    // quartiles divide sorted dataset into four sections
    // Q1 = median of values less than Q2
    // Q2 = median of the data set
    // Q3 = median of values greater than Q2
    if(N % 2 == 0) {
    Q2 = median = (nums[(N/2)-1] + nums[N/2]) / 2.0;
    i = N/2;
    if(i % 2 == 0) {
    Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
    Q3 = (nums[i + ((i-1)/2)] + nums[i+(i/2)]) / 2.0;
    }
    if(i % 2 != 0) {
    Q1 = nums[(i-1)/2];
    Q3 = nums[i + ((i-1)/2)];
    }
    }

    if(N % 2 != 0) {
    Q2 = median = nums[(N-1)/2];
    i = (N-1)/2;
    if(i % 2 == 0) {
    Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
    Q3 = (nums[i + (i/2)] + nums[i + (i/2) + 1]) / 2.0;
    }
    if(i % 2 != 0) {
    Q1 = nums[(i-1)/2];
    Q3 = nums[i + ((i+1)/2)];
    }
    }



    // outliers: below Q1−1.5xIQR, or above Q3+1.5xIQR
    IQR = Q3 - Q1;
    char outliers[200]="", temp[10]="";
    if (N > 3) {

    //range for outliers
    double lo = Q1 - (1.5 * IQR);
    double hi = Q3 + (1.5 * IQR);

    //no outliers
    if ( min > lo && max < hi) {
    strcat(outliers,"none (using IQR * 1.5 rule)");
    }

    //at least one outlier
    if ( min < lo || max > hi) {
    for(i = 0; i < N; i++) {
    double val = (double)nums[i];
    if(val < lo || val > hi) {
    sprintf(temp,"%.0f ",val);
    temp[strlen(temp)] = '\0';
    strcat(outliers,temp);
    }
    }
    strcat(outliers," (using IQR * 1.5 rule)");
    }
    outliers[strlen(outliers)] = '\0';
    }


    stddev = sqrt(sqrdiffmean/N);
    kurtosis = quaddiffmean / (N * pow(sqrt(sqrdiffmean/N),4));


    //output
    printf("\n--------------------------------------------------------------\n");
    printf("Minimum = %.0f\n", min);
    printf("Maximum = %.0f\n", max);
    printf("Range = %.0f\n", max - min);
    printf("Size N = %d\n" , N);
    printf("Sum N = %.0f\n", sumN);
    printf("Mean μ = %.2f\n", mean);
    printf("Median = %.1f\n", median);
    if(umaxcnt > 1) {
    printf("Mode(s) = %s (%d occurrences ea)\n", mode,umaxcnt);}
    if(umaxcnt < 2) {
    printf("Mode(s) = na (no repeating values)\n");}
    printf("Std Dev σ = %.4f\n", stddev);
    printf("Variance σ^2 = %.4f\n", sqrdiffmean/N);
    printf("Mid Range = %.1f\n", (max + min)/2);
    printf("Quartiles");
    if(N > 3) {printf(" Q1 = %.1f\n", Q1);}
    if(N < 4) {printf(" Q1 = na\n");}
    printf(" Q2 = %.1f (median)\n", Q2);
    if(N > 3) {printf(" Q3 = %.1f\n", Q3);}
    if(N < 4) {printf(" Q3 = na\n");}
    printf("IQR = %.1f (interquartile range)\n", IQR);
    if(N > 3) {printf("Outliers = %s\n", outliers);}
    if(N < 4) {printf("Outliers = na\n");}
    printf("Sum Squares SS = %.2f\n", sqrdiffmean);
    printf("MAD = %.4f (mean absolute deviation)\n", meanabsdev / N);
    printf("Root Mean Sqr = %.4f\n", sqrt(rootmeansqr / N));
    printf("Std Error Mean = %.4f\n", stddev / sqrt(N));
    printf("Skewness γ1 = %.4f\n", cubediffmean / (N * pow(sqrt(sqrdiffmean/N),3)));
    printf("Kurtosis β2 = %.4f\n", kurtosis);
    printf("Kurtosis Excess α4 = %.4f\n", kurtosis - 3);
    printf("CV = %.6f (coefficient of variation\n", sqrt(sqrdiffmean/N) / mean);
    printf("RSD = %.4f%% (relative std deviation)\n", 100 * (sqrt(sqrdiffmean/N) / mean));
    printf("--------------------------------------------------------------\n");
    printf("Check results against\n");
    printf("https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php");
    printf("\n\n");

    return(0);
    }


    =====================================================================================



    Non-static local variables without initialisers have "indeterminate"
    value if there is no initialiser.  Trying to use these "indeterminate" values is undefined behaviour - you have absolutely no control over what might happen.  Any particular behaviour you see is done to luck from the rest of the code and what happened to be in memory at the time.

    In 2024 that's surprising. I can't be the only one to forget to
    initialize a char[] variable.



    There is no automatic initialisation of non-static local variables,
    because that would often be inefficient.

    It would've saved me half an hour of frustration.

    Now I'm getting 'stack smashing detected' errors (after the program runs correctly) when using datasets of consecutive numbers.

    hmmmm 2 issues in a row using consecutives - that's a clue!



    The best way to avoid errors
    like yours, IMHO, is not to declare such variables until you have data
    to put in them - thus you always have a sensible initialiser of real
    data.  Occasionally that is not practical, but it works in most cases.

    Data is definitely going in them: either the value 'none' or a list of
    the outliers and some text.



    For a data array, zero initialisation is common.  Typically you do this
    with :

        int xs[100] = { 0 };

    That puts the explicit 0 in the first element of xs, and then the rest
    of the array is cleared with zeros.

    I recommend never using "char" as a type unless you really mean a > character, limited to 7-bit ASCII.  So if your "outliers" array really
    is an array of such characters, "char" is fine.  If it is intended to be numbers and for some reason you specifically want 8-bit values, use
    "uint8_t" or "int8_t", and initialise with { 0 }.

    I did mean characters, limited to: 0-9a-zA-Z()

    I think I'm using the char variable correctly.
    sprintf(tempchar,"%d ",outlier);
    strcat(char,tempchar);


    A major lesson here is to learn how to use your tools.  C is not a
    forgiving language.  Make use of all the help your tools can give you - enable warnings here.  "gcc -Wall" enables a range of common warnings
    with few false positives in normal well-written code, including ones
    that check for attempts to read uninitialised data.

    I always use -Wall, and I was using it here.


    "-Wextra" enables a
    slew of extra warnings.  Some of these will annoy people and trigger on
    code they find reasonable, while most are good choices for a lot of code
    - but personal preference varies significantly.  And remember to enable optimisation, since it makes the static checking more powerful.

    Just did this:
    gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm

    and no warnings or errors at all.

    But: it now aborts near the front when using consecutive data points
    (but not randoms).

    *** buffer overflow detected ***: terminated
    Aborted

    I'm actually happy about that. I should be able to find and fix it.



    If you /really/ want gcc to zero out such local data automatically, use "-ftrivial-auto-var-init=zero".  But it is much better to use warnings
    and write correct code - options like that one are an addition to well-checked code for paranoid software in security-critical contexts.


    Great answer! I can always count on D Brown for excellent advice.
    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Keith Thompson on Thu Jun 13 02:19:59 2024
    On 13.06.2024 00:22, Keith Thompson wrote:

    This:
    char outliers[100] = "";
    initializes all 100 elements to zero. So does this:
    char outliers[100] = { '\0' };
    Any elements or members not specified in an initializer are set to zero.

    Oops! This surprised me. (But you are right.) The overhead isn't [syntactically] obvious, but I'm anyway always setting a single
    '\0' character if I want to store strings in a 'char[]' and have
    it initialized to an empty string (like below).

    If you want to set an array's 0th element to 0 and not waste time initializing the rest, you can assign it separately:
    char outliers[100];
    outliers[0] = '\0';
    or
    char outliers[100];
    strcpy(outliers, "");
    though the overhead of the function call is likely to outweigh the
    cost of initializing the array.

    It wouldn't occur to me to use the strcpy() function, but is the
    function call really that expensive in C ?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ike Naar@21:1/5 to DFS on Thu Jun 13 07:25:58 2024
    On 2024-06-12, DFS <[email protected]> wrote:
    //no outliers
    if ( min > lo && max < hi) {

    The condition for 'no outliers' is not the complement of
    the condition for 'at least one outlier' below.

    strcat(outliers,"none (using IQR * 1.5 rule)");
    }

    //at least one outlier
    if ( min < lo || max > hi) {
    for(i = 0; i < N; i++) {
    double val = (double)nums[i];
    if(val < lo || val > hi) {
    sprintf(temp,"%.0f ",val);
    temp[strlen(temp)] = '\0';

    This is unnecessary;
    sprintf terminates the generated string with a null character.

    strcat(outliers,temp);
    }
    }
    strcat(outliers," (using IQR * 1.5 rule)");
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to DFS on Thu Jun 13 10:43:51 2024
    On 12/06/2024 21:47, DFS wrote:
    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values
    40 are flagged as outliers.  Up to 40, no problem.  Random numbers
    dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers (using
    the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away.  Reset it to before and problem came back.

    Makes no sense.  What could cause the program to go FUBAR at data point
    41+ only when the dataset is consecutive numbers?

    I assume outliers is inside a function.

    What are the 100 values of outliers if you don't initialise it? You can
    try printing them out (as individual numbers not as a string) although
    just doing that, and adding that extra code, may change the actual values.

    However that doesn't matter if it still goes wrong; you may still get a
    hint as to why it's behaving as it is.

    Also, why doesn't gcc just do you a solid and initialize to "" for you?

    Initialising to "" will zero the entire array. You really want the
    compiler to do that work, even when you're going to overwrite it anyway?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Thu Jun 13 14:42:22 2024
    On 13/06/2024 00:18, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    [...]
    I recommend never using "char" as a type unless you really mean a
    character, limited to 7-bit ASCII. So if your "outliers" array really
    is an array of such characters, "char" is fine. If it is intended to
    be numbers and for some reason you specifically want 8-bit values, use
    "uint8_t" or "int8_t", and initialise with { 0 }.
    [...]

    The implementation-definedness of plain char is awkward, but char
    arrays generally work just fine for UTF-8 strings.

    Yes, but "generally work" is not quite as strong as I would like. My preference for UTF-8 strings is a const unsigned char type (with C23, it
    will be char8_t, which is defined to be the same type as "unsigned
    char"). But u8"Hello, world" UTF-8 string literals (since C11) are
    considered to be like an array of type "char" in C (until C23), so I
    guess UTF-8 strings will be safe in plain char arrays. Still, the bytes
    in a UTF-8 strings are code units with values between 0 and 255, so I
    prefer to store these in a type that can hold that range of values.

    (What happens if you have a platform that uses ones' complement
    arithmetic, with "char" being signed and a range of -127 to +127, and
    you have a u8"..." string which has a code unit of 0x80 that cannot be represented in "char" ? It's just a hypothetical question, of course.)


    If char is
    signed, byte values greater than 127 will be stored as negative
    values, but it will almost certainly just work (if your system
    is configured to handle UTF-8). Likewise for Latin-1 and similar
    8-bit character sets.

    The standard string functions operate on arrays of plain char, so
    storing UTF-8 strings in arrays of uint8_t or unsigned char will
    seriously restrict what you can do with them.

    (I'd like to a future standard require plain char to be unsigned,
    but I don't know how likely that is.)


    I would also prefer that, but too much existing code relies on plain
    char being signed on the platforms it runs on. I personally think the
    idea of having signed or unsigned characters is a very poor choice of
    names for the terms, but it's way too late to change that! C23 has
    "char8_t" which is always unsigned.

    (In C23, "char8_t" is defined in <uchar.h> and is the same type as
    "unsigned char". In C++20, in contrast, "char8_t" is a keyword and a
    distinct type with identical size and range to "unsigned char".)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Janis Papanagnou on Thu Jun 13 15:28:20 2024
    On 13/06/2024 02:19, Janis Papanagnou wrote:
    On 13.06.2024 00:22, Keith Thompson wrote:

    This:
    char outliers[100] = "";
    initializes all 100 elements to zero. So does this:
    char outliers[100] = { '\0' };
    Any elements or members not specified in an initializer are set to zero.

    Oops! This surprised me. (But you are right.) The overhead isn't [syntactically] obvious, but I'm anyway always setting a single
    '\0' character if I want to store strings in a 'char[]' and have
    it initialized to an empty string (like below).

    If you want to set an array's 0th element to 0 and not waste time
    initializing the rest, you can assign it separately:
    char outliers[100];
    outliers[0] = '\0';
    or
    char outliers[100];
    strcpy(outliers, "");
    though the overhead of the function call is likely to outweigh the
    cost of initializing the array.

    It wouldn't occur to me to use the strcpy() function, but is the
    function call really that expensive in C ?


    That depends on your toolchain.

    If you are using a Windows-based compiler with an external DLL for the C library and the compiler doesn't handle the strcpy() directly, then it
    can be quite a lot of overhead. You have the call to the DLL, which
    involves a few steps of indirection. The library strcpy() may be
    optimised for handling large strings, and may save and restore a lot of registers (such as SIMD vector registers).

    If you are using a compiler (whatever the platform) that optimises
    "strcpy", it will generate identical code to "outliers[0] = '\0';".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to DFS on Thu Jun 13 15:21:54 2024
    On 13/06/2024 00:34, DFS wrote:
    On 6/12/2024 6:22 PM, Keith Thompson wrote:
    Janis Papanagnou <[email protected]> writes:
    On 12.06.2024 22:47, DFS wrote:
    [...]
    before: char outliers[100];
    after : char outliers[100] = "";
    [...]
    Seriously; why do you expect [in C] a declaration to initialize that
    stack object? (There are other languages that do initializations as
    the language defines it, but C doesn't; it may help to learn before
    programming in any language?) And why do you think that "" would be
    an appropriate initialization (i.e. a single '\0' character) and not
    all 100 elements set to '\0'? (Someone else might want to access the
    element 'answer[99]'.) And should we pay for initializing 1000000000
    characters in case one declares an appropriate huge array?

    This:
         char outliers[100] = "";
    initializes all 100 elements to zero.  So does this:
         char outliers[100] = { '\0' };
    Any elements or members not specified in an initializer are set to zero.

    Yes. It's good to point that out, since people might assume that using
    a string literal here only initialises the bit covered by that string
    literal.

    (In C23 you can also write "char outliers[100] = {};" to get all zeros.)


    If you want to set an array's 0th element to 0 and not waste time
    initializing the rest, you can assign it separately:
         char outliers[100];
         outliers[0] = '\0';
    or
         char outliers[100];
         strcpy(outliers, "");
    though the overhead of the function call is likely to outweigh the
    cost of initializing the array.

    A good compiler will generate the same code for both cases - strcpy() is
    often inlined for such uses.


    Thanks.  I'll have to remember these things.  I like to use char arrays.

    The problem is I don't use C very often, so I don't develop muscle memory.


    What programming language do you usually use? And why are you writing
    in C instead? (Or do you simply not do much programming?)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to DFS on Thu Jun 13 15:15:55 2024
    On 13/06/2024 00:29, DFS wrote:
    On 6/12/2024 5:38 PM, David Brown wrote:
    On 12/06/2024 22:47, DFS wrote:
    Wrote a C program to mimic the stats shown on:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    My code compiles and works fine - every stat matches - except for one
    anomaly: when using a dataset of consecutive numbers 1 to N, all
    values  > 40 are flagged as outliers.  Up to 40, no problem.  Random
    numbers dataset of any size: no problem.

    And values 41+ definitely don't meet the conditions for outliers
    (using the IQR * 1.5 rule).

    Very strange.

    Edit: I just noticed I didn't initialize a char:
    before: char outliers[100];
    after : char outliers[100] = "";

    And the problem went away.  Reset it to before and problem came back.

    Makes no sense.  What could cause the program to go FUBAR at data
    point 41+ only when the dataset is consecutive numbers?

    Also, why doesn't gcc just do you a solid and initialize to "" for you?


    It is /really/ difficult to know exactly what your problem is without
    seeing your C code!  There may be other problems that you haven't seen
    yet.

    The outlier section starts on line 169 =====================================================================================

    <snip>

    Apart from the initialisation issue, I would suggest you re-consider the
    way you add strings to the "outliers" buffer. If there are two many of
    them, it will overflow - there's nothing to stop you putting more than
    200 characters into it. I would recommend dropping the "temp" variable
    and instead keep track of a pointer to the terminated null character of
    your current "outliers" string. Use "snprintf" to "print" directly into
    the string, rather than going via "temp", and use the return value of
    the "snprintf" to update your end pointer. You will easily be able to
    avoid the risk of overrun, while also being slightly more efficient too.

    The line:

    outliers[strlen(outliers)] = '\0';

    is completely useless. "strlen" starts at the beginning of "outliers",
    and counts along until it finds a null character - thus either "outliers[strlen(outliers)]" is already equal to '\0', or your attempt
    at calculating "strlen" with an overrun buffer will lead to more
    undefined behaviour.


    Non-static local variables without initialisers have "indeterminate"
    value if there is no initialiser.  Trying to use these "indeterminate"
    values is undefined behaviour - you have absolutely no control over
    what might happen.  Any particular behaviour you see is done to luck
    from the rest of the code and what happened to be in memory at the time.

    In 2024 that's surprising.  I can't be the only one to forget to
    initialize a char[] variable.


    You are not - attempting to use an uninitialised variable is a common
    error. That is why C compilers provide warnings about this kind of
    thing, along with run-time tools like the sanitizers Ben recommended, to
    help find such mistakes. But compiler vendors can't force people to use
    such tools and warning flags, nor can the tools find /all/ cases of
    errors. At some point, programmers have to take responsibility for
    knowing the language they are using, and writing their code correctly.
    Good tools and good use of those tools is an aid to careful coding, not
    an alternative to it.



    There is no automatic initialisation of non-static local variables,
    because that would often be inefficient.

    It would've saved me half an hour of frustration.

    And the things you have learned as a result - from your own debugging,
    and the threads here - will save you many more hours of frustration in
    the future.

    There are languages that focus on ease of use and do all the management
    of things like strings and buffers, and prevent users from mistakes like
    this, at the cost of slower run-times. There are languages that do very
    little automatically for the programmer and have absolutely minimal
    overheads, for maximal efficiency. C is the later kind of language.

    Remember, while you might see automatic initialisation of local
    variables as a negligible overhead, other people might not - I've worked
    on C code for microcontrollers where a wasted processor cycle or two is
    too much. If your code does not care about such efficiencies, then you
    have to question whether C is the right language in the first place. I
    believe most modern code that is written in C would be better if it were written in other higher level languages (precisely because a half hour
    of /your/ time is usually more valuable than a few microseconds of your computer's time).


    On the subject of initialisation, I strongly suggest that you do /not/
    get in the habit of always initialising your variables to 0 when you
    define them. Do that only if 0 is the real, appropriate starting value.
    Prefer to avoid declaring the variable at all until you need it, then
    define it with its initial value (and consider making it "const" to
    reduce the risk of other coding errors). If the structure of the code
    requires you to define the variable before you have a value for it,
    prefer to leave it without an initial value. Then compiler warnings
    have a much better chance of spotting mistakes.


    Now I'm getting 'stack smashing detected' errors (after the program runs correctly) when using datasets of consecutive numbers.


    I think Ben found that buffer overrun for you, and showed you how to
    find it yourself in the future.

    hmmmm 2 issues in a row using consecutives - that's a clue!



    The best way to avoid errors like yours, IMHO, is not to declare such
    variables until you have data to put in them - thus you always have a
    sensible initialiser of real data.  Occasionally that is not
    practical, but it works in most cases.

    Data is definitely going in them: either the value 'none' or a list of
    the outliers and some text.


    Now that I have your source code, I can see the error is the way you put
    data in - strcat() reads the existing data, it does not just write data.



    For a data array, zero initialisation is common.  Typically you do
    this with :

         int xs[100] = { 0 };

    That puts the explicit 0 in the first element of xs, and then the rest
    of the array is cleared with zeros.

    I recommend never using "char" as a type unless you really mean a  >
    character, limited to 7-bit ASCII.  So if your "outliers" array really
    is an array of such characters, "char" is fine.  If it is intended to
    be numbers and for some reason you specifically want 8-bit values, use
    "uint8_t" or "int8_t", and initialise with { 0 }.

    I did mean characters, limited to: 0-9a-zA-Z()

    OK.


    I think I'm using the char variable correctly.
     sprintf(tempchar,"%d ",outlier);
     strcat(char,tempchar);

    Yes. Without your source code, I could only guess.

    But see earlier in this post for a suggestion to improve your use of the variable.



    A major lesson here is to learn how to use your tools.  C is not a
    forgiving language.  Make use of all the help your tools can give you
    - enable warnings here.  "gcc -Wall" enables a range of common
    warnings with few false positives in normal well-written code,
    including ones that check for attempts to read uninitialised data.

    I always use -Wall, and I was using it here.


    Good. Unfortunately, good though gcc is, it is not perfect. Improving warnings is a continuous endeavour for the gcc developers, but they
    usually have to err on the side of avoiding false positives.


    "-Wextra" enables a
    slew of extra warnings.  Some of these will annoy people and trigger
    on code they find reasonable, while most are good choices for a lot of
    code - but personal preference varies significantly.  And remember to
    enable optimisation, since it makes the static checking more powerful.

    Just did this:
    gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm


    "-O3" is rarely much use - stick to "-O2" for normal use. The extra optimisations enabled by "-O3" help in some code, but work worse on
    other code due to the increased size, so they should be used with care. Certainly "-O3" is rarely worth it unless you are also using a "-march="
    flag (such as "-fmarch=native") to tune for a particular processor and
    enable stuff like vectorisation. Getting the fastest code is more of an
    art than a science!

    and no warnings or errors at all.

    But: it now aborts near the front when using consecutive data points
    (but not randoms).

    *** buffer overflow detected ***: terminated
    Aborted

    I'm actually happy about that.  I should be able to find and fix it.



    If you /really/ want gcc to zero out such local data automatically,
    use "-ftrivial-auto-var-init=zero".  But it is much better to use
    warnings and write correct code - options like that one are an
    addition to well-checked code for paranoid software in
    security-critical contexts.


    Great answer!   I can always count on D Brown for excellent advice.
    Thank you.


    I try :-)

    You get the best results by combing the advice from a variety of people
    here, along with your own experimentations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Ike Naar on Thu Jun 13 11:13:04 2024
    On 6/13/2024 3:25 AM, Ike Naar wrote:
    On 2024-06-12, DFS <[email protected]> wrote:
    //no outliers
    if ( min > lo && max < hi) {

    The condition for 'no outliers' is not the complement of
    the condition for 'at least one outlier' below.

    You're saying some outliers will not be flagged?



    strcat(outliers,"none (using IQR * 1.5 rule)");
    }

    //at least one outlier
    if ( min < lo || max > hi) {
    for(i = 0; i < N; i++) {
    double val = (double)nums[i];
    if(val < lo || val > hi) {
    sprintf(temp,"%.0f ",val);
    temp[strlen(temp)] = '\0';

    This is unnecessary;
    sprintf terminates the generated string with a null character.

    Thanks.


    strcat(outliers,temp);
    }
    }
    strcat(outliers," (using IQR * 1.5 rule)");
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to David Brown on Thu Jun 13 10:38:12 2024
    On 6/13/2024 9:21 AM, David Brown wrote:
    On 13/06/2024 00:34, DFS wrote:
    On 6/12/2024 6:22 PM, Keith Thompson wrote:
    Janis Papanagnou <[email protected]> writes:
    On 12.06.2024 22:47, DFS wrote:
    [...]
    before: char outliers[100];
    after : char outliers[100] = "";
    [...]
    Seriously; why do you expect [in C] a declaration to initialize that
    stack object? (There are other languages that do initializations as
    the language defines it, but C doesn't; it may help to learn before
    programming in any language?) And why do you think that "" would be
    an appropriate initialization (i.e. a single '\0' character) and not
    all 100 elements set to '\0'? (Someone else might want to access the
    element 'answer[99]'.) And should we pay for initializing 1000000000
    characters in case one declares an appropriate huge array?

    This:
         char outliers[100] = "";
    initializes all 100 elements to zero.  So does this:
         char outliers[100] = { '\0' };
    Any elements or members not specified in an initializer are set to zero.

    Yes.  It's good to point that out, since people might assume that using
    a string literal here only initialises the bit covered by that string literal.

    (In C23 you can also write "char outliers[100] = {};" to get all zeros.)


    If you want to set an array's 0th element to 0 and not waste time
    initializing the rest, you can assign it separately:
         char outliers[100];
         outliers[0] = '\0';
    or
         char outliers[100];
         strcpy(outliers, "");
    though the overhead of the function call is likely to outweigh the
    cost of initializing the array.

    A good compiler will generate the same code for both cases - strcpy() is often inlined for such uses.


    Thanks.  I'll have to remember these things.  I like to use char arrays. >>
    The problem is I don't use C very often, so I don't develop muscle
    memory.


    What programming language do you usually use?  And why are you writing
    in C instead?  (Or do you simply not do much programming?)

    I write a little code every few days. Mostly python.

    I like C for it's blazing speed. Very addicting. And it's much more challenging/frustrating than python.

    I coded a subset (8 stat measures) of this C program 3.5 years ago, and recently decided to finish duplicating all 23 stats shown at:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    Working on the outliers code, I decided to add an option to generate
    data with consecutive numbers. That's when I ran $./dfs 50 -c and
    noticed every value above 40 was considered an outlier. And this didn't
    change over a bunch of code edits/file saves/compiles.

    Understanding how an uninitialized variable caused that persistent issue
    is beyond my pay grade.

    That's when I whined to clc. Before I even posted, though, I spotted
    the uninitialized var (outliers). Later I spotted another one (mode).

    One led to 'undefined behavior', the other to 'stack smashing'. Both
    only occurred when using consecutive numbers.

    But with y'all's help I believe I found and fixed ALL issues. I can
    dream anyway.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Malcolm McLean on Thu Jun 13 15:39:25 2024
    Malcolm McLean <[email protected]> writes:
    On 13/06/2024 01:33, Keith Thompson wrote:

    printf is a variadic function, so the types of the arguments after
    the format string are not specified in its declaration. The printf
    function has to *assume* that arguments have the types specified
    by the format string. This:
    printf("%d\n", foo);
    (probably) has undefined behavior if foo is of type size_t.

    And isn't that a nightmare?

    No, because compilers have been able to diagnose mismatches
    for more than two decades.



    We just can't have size_t variables swilling around in prgrams for these >reasons.

    POSIX defines a set of strings that can be used by a programmer to
    specify the format string for size_t on any given implementation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to DFS on Thu Jun 13 15:49:46 2024
    On Thu, 13 Jun 2024 11:13:04 -0400, DFS wrote:

    On 6/13/2024 3:25 AM, Ike Naar wrote:
    On 2024-06-12, DFS <[email protected]> wrote:
    //no outliers
    if ( min > lo && max < hi) {

    The condition for 'no outliers' is not the complement of
    the condition for 'at least one outlier' below.

    You're saying some outliers will not be flagged?

    [1] How does the above statement evaluate when (min == low) and (max == hi)?


    strcat(outliers,"none (using IQR * 1.5 rule)"); >>> }

    //at least one outlier
    if ( min < lo || max > hi) {

    [2] How does the above statement evaluate when (min == low) and (max == hi)?


    [3] Given the answers to questions 1 and 2, are there any values that
    satisfy /both/ the "no outliers" and "at least one outlier" conditions?
    Are there any values that satisfy /neither/ conditions?

    [snip]


    HTH
    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to DFS on Thu Jun 13 15:40:29 2024
    DFS <[email protected]> writes:
    On 6/13/2024 3:25 AM, Ike Naar wrote:

    temp[strlen(temp)] = '\0';

    This is unnecessary;
    sprintf terminates the generated string with a null character.

    Thanks.

    Most programmers should consider sprintf to be deprecated and
    should never used it. snprintf is safer and more capable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Scott Lurndal on Thu Jun 13 18:08:03 2024
    [email protected] (Scott Lurndal) writes:

    POSIX defines a set of strings that can be used by a programmer to
    specify the format string for size_t on any given implementation.

    And C provides "%zu".

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Lew Pitcher on Thu Jun 13 13:05:43 2024
    On 6/13/2024 11:49 AM, Lew Pitcher wrote:
    On Thu, 13 Jun 2024 11:13:04 -0400, DFS wrote:

    On 6/13/2024 3:25 AM, Ike Naar wrote:
    On 2024-06-12, DFS <[email protected]> wrote:

    //no outliers
    if ( min > lo && max < hi) {

    The condition for 'no outliers' is not the complement of
    the condition for 'at least one outlier' below.

    You're saying some outliers will not be flagged?

    [1] How does the above statement evaluate when (min == low) and (max == hi)?




    //at least one outlier
    if ( min < lo || max > hi) {

    [2] How does the above statement evaluate when (min == low) and (max == hi)?



    [3] Given the answers to questions 1 and 2, are there any values that
    satisfy /both/ the "no outliers" and "at least one outlier" conditions?
    Are there any values that satisfy /neither/ conditions?



    [snip]


    HTH


    It does help. The original code won't miss any outliers, but it also
    won't notify you there were none in the exceedingly rare case that the
    bounds of the dataset exactly match the bounds of the outlier rule.

    No outliers test:
    Orig : if (min > lo && max < hi)
    Fixed: if (min >= lo && max <= hi)

    At least one outlier test:
    Orig: if (min < lo || max > hi) {
    No fix necessary


    Thanks Lew.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Scott Lurndal on Thu Jun 13 19:01:23 2024
    On 13/06/2024 16:39, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 13/06/2024 01:33, Keith Thompson wrote:

    printf is a variadic function, so the types of the arguments after
    the format string are not specified in its declaration. The printf
    function has to *assume* that arguments have the types specified
    by the format string. This:
    printf("%d\n", foo);
    (probably) has undefined behavior if foo is of type size_t.

    And isn't that a nightmare?

    No, because compilers have been able to diagnose mismatches
    for more than two decades.

    What about the previous 3 decades?

    What about the compilers that can't do that?

    What about even the latest gcc 14.1 that won't diagnose it even with
    -Wpedantic -Wextra?

    What about when the format string is a variable?

    What about the example given below?

    It is definitely a language problem. Dealing with some of it with some compilers with some options isn't a solution, it's just a workaround.

    Meanwhile for over 4 decades I've been able to just write 'print foo'
    with no format mismatch, because such a silly concept doesn't exist.
    THAT's how you deal with it.




    We just can't have size_t variables swilling around in prgrams for these
    reasons.

    POSIX defines a set of strings that can be used by a programmer to
    specify the format string for size_t on any given implementation.

    And here it just gets even uglier. You also get situations like this:

    uint64_t i=0;
    printf("%lld\n", i);

    This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
    and it complains the format should be %ld. Change it to %ld, and it
    complains under Windows.

    It can't tell you that you should be using one of those ludicrous macros.

    I've also just noticed that 'i' is unsigned but the format calls for
    signed. That may or may not be deliberate, but the compiler didn't say anything.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Jun 14 00:55:12 2024
    Malcolm McLean <[email protected]> writes:

    On 13/06/2024 19:01, bart wrote:

    And here it just gets even uglier. You also get situations like this:
    ��� uint64_t i=0;
    ��� printf("%lld\n", i);
    This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
    and it complains the format should be %ld. Change it to %ld, and it
    complains under Windows.
    It can't tell you that you should be using one of those ludicrous macros.
    I've also just noticed that 'i' is unsigned but the format calls for
    signed. That may or may not be deliberate, but the compiler didn't say
    anything.

    Exactly. We can't have this just to print out an integer.

    This is how C works. There's no point in moaning about it. Use another language or do what you have to in C.

    In Baby X I provide a function called bbx_malloc(). It's is guaranteed
    never to return null. Currently it just calls exit() on allocation failure. But it also limits allocation to slightly under INT_MAX. Which should be plenty for a Baby program, and if you want more, you always have big boy's malloc.

    And if you need to change the size?

    But at a stroke, that gets rid of any need for size_t,

    But sizeof, strlen (and friends like the mbs... and wcs... functions),
    strspn (and friend), strftime, fread, fwrite. etc. etc. all return
    size_t.

    For people taught to ignore size_t, care is also needed when calling
    functions that take size_t arguments as the signed to unsigned
    conversion can cause surprises when not flagged by the compiler. I
    don't know if I am right, but I would bet that many of the "don't bother
    with size_t" crowd are also in the "don't bother with all those warning
    flags to the compiler" crowd.

    and long is very
    special purpose (it holds the 32 bit rgba values).

    Isn't that rather wasteful when long is 64 bits?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Keith Thompson on Fri Jun 14 02:18:45 2024
    On 13/06/2024 23:58, Keith Thompson wrote:
    bart <[email protected]> writes:

    Meanwhile for over 4 decades I've been able to just write 'print foo'
    with no format mismatch, because such a silly concept doesn't exist.
    THAT's how you deal with it.

    By using a different language, which perhaps you should consider
    discussing in a different newsgroup. We discuss C here.

    That was my point about the 3 decades it took to do something about it.
    In the end nothing really changed.

    If foo is an int, for example, printf lets you decide how to print
    it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
    in C23), upper vs. lower case for hex). Perhaps "print foo" in
    your language has similar features.

    The format string specified two things. One is to do with the type of an expression, which the compiler knows. After all that's how sometimes it
    can tell you you've got it wrong.

    And if it can do that, it could also put in the format for you.


    Yes, the fact that incorrect printf format strings cause undefined
    behavior, and that that's sometimes difficult to diagnose, is a
    language problem. I don't recall anyone saying it isn't. But it's
    really not that hard to deal with it as a programmer.

    If you have ideas (other than abandoning C) for a flexible
    type-safe printing function, by all means share them. What are your suggestions?

    A few years ago I played with a "%?" format code in my 'bcc' compiler
    and demonstrated it here. The ? gets replaced by some suitable format
    code. This is done within the compiler, not the printf library.

    For other display control, such as hex output, or to provide other info
    such as width, that still needs to be provided as it is done now.

    This would cover most of my points except variable format strings, which
    you said were not worth worrying about.

    Here is a demo:

    --------------------------
    #include <stdio.h>
    #include <stdint.h>
    #include <time.h>

    int main(void) {
    uint64_t a = 0xFFFFFFFF00000000;
    float b = 1.46;
    int c = -67;
    char* d = "Hello";
    int* e = &c;

    for (int i=0; i<100000000; ++i);

    clock_t f = clock();

    printf("%=? %=? %=? %=? %=? %=?\n", a, b, c, d, e, f);
    printf("%=? %=? %=? %=? %=? %=?\n", f, e, d, c, b, a);
    }
    --------------------------

    This prints 6 variables of diverse types with a suitable default format.
    Then it prints then in reverse order, without having to change those
    format codes.

    The '=' is an extra feature which displays the name of the argument.

    The output from this was:

    A=18446744069414584320 B=1.460000 C=-67 D=Hello E=000000000080FF08 F=219
    F=219 E=000000000080FF08 D=Hello C=-67 B=1.460000 A=18446744069414584320

    It's not quite as good as my language where it's just:

    println =a, =b, =c, =d, =d, =f

    but I think it was an interesting experiment. This required 50 lines of
    code within my C compiler; a bit more for a full treatment.







    Adding `print` as a new keyword so you can use `print
    foo` is unlikely to be considered practical; I'd want a much more
    general mechanism that's not limited to stdio files. Reasonable new
    language features that enable type-safe printf-like functions could
    be interesting. I'm not aware of any such proposals for C.

    We just can't have size_t variables swilling around in prgrams for these >>>> reasons.
    POSIX defines a set of strings that can be used by a programmer to
    specify the format string for size_t on any given implementation.

    And here it just gets even uglier. You also get situations like this:

    uint64_t i=0;
    printf("%lld\n", i);

    This compiles OK with gcc -Wall, on Windows64. But compile under
    Linux64 and it complains the format should be %ld. Change it to %ld,
    and it complains under Windows.

    It can't tell you that you should be using one of those ludicrous macros.

    And you know why, right? uint64_t is a typedef (an alias) for some
    existing type, typically either unsigned long or unsigned long long.
    If uint64_t is a typedef for unsigned long long, then i is of type
    unsigned long long, and the format string is correct.

    Sure, that's a language problem. It's unfortunate that code can be
    either valid or a constraint violation depending on how the current implementation defines uint64_t. I just don't spend much time
    complaining about it.

    I wouldn't mind seeing a new kind of typedef that creates a new type
    rather than an alias. Then uint64_t could be a distinct type.
    That could cause some problems for _Generic, for example.

    C99 added <stdint.h>, defining fixed-width and other integer types using existing language features. Sure, there are some disadvantages in the
    way it was done. The alternative, creating new language features, would likely have resulted in the proposal not being accepted until some time
    after C99, if ever.

    I've also just noticed that 'i' is unsigned but the format calls for
    signed. That may or may not be deliberate, but the compiler didn't say
    anything.

    The standard allows using an argument of an integer type with a format
    of the corresponding type of the other signedness, as long as the value
    is in the range of both. (I vaguely recall the standard's wording being
    a bit vague on this point.)


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Jun 14 12:44:13 2024
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 00:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 13/06/2024 19:01, bart wrote:

    And here it just gets even uglier. You also get situations like this:
    ��� uint64_t i=0;
    ��� printf("%lld\n", i);
    This compiles OK with gcc -Wall, on Windows64. But compile under Linux64 >>>> and it complains the format should be %ld. Change it to %ld, and it
    complains under Windows.
    It can't tell you that you should be using one of those ludicrous macros. >>>> I've also just noticed that 'i' is unsigned but the format calls for
    signed. That may or may not be deliberate, but the compiler didn't say >>>> anything.

    Exactly. We can't have this just to print out an integer.
    This is how C works. There's no point in moaning about it. Use another
    language or do what you have to in C.

    In Baby X I provide a function called bbx_malloc(). It's is guaranteed
    never to return null. Currently it just calls exit() on allocation failure. >>> But it also limits allocation to slightly under INT_MAX. Which should be >>> plenty for a Baby program, and if you want more, you always have big boy's >>> malloc.
    And if you need to change the size?

    But at a stroke, that gets rid of any need for size_t,
    But sizeof, strlen (and friends like the mbs... and wcs... functions),
    strspn (and friend), strftime, fread, fwrite. etc. etc. all return
    size_t.

    But these are not Baby X functions.

    Neither is malloc but you wanted t replace that to get rid of the need
    for size_t.

    I confess that I am all at sea about what you are doing. In essence, I
    don't understand the rules of the game so I should probably just stop commenting.

    and long is very
    special purpose (it holds the 32 bit rgba values).
    Isn't that rather wasteful when long is 64 bits?

    No, because we store images as unsigned char buffers. But it's convenient
    to pass around coulor values in a single variable.

    Right. So you don't always use long for "holding rgba values". Another
    rule I didn't know.

    However there is the worry that accessing rgba channels as bytes rather
    than insisting that the buffer be aligned, and accessing as a 32-bit
    value,

    Which is why I thought you might be including images in the notion of
    "holding rgba values".

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to Malcolm McLean on Fri Jun 14 16:32:58 2024
    On 14/06/2024 15:30, Malcolm McLean wrote:
    Yes, I really need to get that website together so that people cotton on
    to what Baby X is, what it can and cannot do, and what is the point.

    Is it a shell? A windowing toolkit? A filesystem? A resource compiler?

    I have no idea.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Fri Jun 14 19:13:42 2024
    On 14/06/2024 01:47, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    [...]
    Certainly "-O3" is rarely worth it unless you are also using a
    "-march=" flag (such as "-fmarch=native") to tune for a particular
    processor and enable stuff like vectorisation. Getting the fastest
    code is more of an art than a science!

    Typo: it's "-march=native".


    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Fri Jun 14 19:08:13 2024
    On 14/06/2024 00:58, Keith Thompson wrote:
    bart <[email protected]> writes:
    On 13/06/2024 16:39, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 13/06/2024 01:33, Keith Thompson wrote:



    If foo is an int, for example, printf lets you decide how to print
    it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
    in C23), upper vs. lower case for hex). Perhaps "print foo" in
    your language has similar features.


    C23 also adds explicit width length modifiers. So instead of having to
    guess if uint64_t is "%llu" or "%lu" on a particular platform, or using
    the PRIu64 macro, you can now use "%w64u" for uint64_t (or
    uint_least64_t if the exact width type does not exist). I think that's
    about as neat as you could get, within the framework of printf.

    Yes, the fact that incorrect printf format strings cause undefined
    behavior, and that that's sometimes difficult to diagnose, is a
    language problem. I don't recall anyone saying it isn't. But it's
    really not that hard to deal with it as a programmer.

    It is particularly easy if you have a decent compiler and know how to
    enable the right warning flags!


    If you have ideas (other than abandoning C) for a flexible
    type-safe printing function, by all means share them. What are your suggestions? Adding `print` as a new keyword so you can use `print
    foo` is unlikely to be considered practical; I'd want a much more
    general mechanism that's not limited to stdio files. Reasonable new
    language features that enable type-safe printf-like functions could
    be interesting. I'm not aware of any such proposals for C.


    It is possible to come a long way with variadic macros and _Generic.
    You can at least end up being able to write something like :

    int x = 123;
    const char * s = "Hello, world!";
    uint64_t u = 0x4242;

    Print("X = ", x, " the string is ", s, " and u = 0x",
    as_hex(u, 6), newline);

    rather than:

    printf("X = %i the string is %s and u = 0x%06lx\n");


    Which you think is better is a matter of opinion.


    I wouldn't mind seeing a new kind of typedef that creates a new type
    rather than an alias. Then uint64_t could be a distinct type.
    That could cause some problems for _Generic, for example.

    I too would like such a typedef. Using it for uint64_t would cause
    problems for /existing/ uses of _Generic, but would make future uses better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Fri Jun 14 17:36:19 2024
    David Brown <[email protected]> writes:
    On 13/06/2024 16:38, DFS wrote:

    What programming language do you usually use?  And why are you writing
    in C instead?  (Or do you simply not do much programming?)

    I write a little code every few days.  Mostly python.

    Certainly if I wanted to calculate some statistics from small data sets,
    I'd go for Python - it would not consider C unless it was for an
    embedded system.

    I'd likely turn to R instead of Python for that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to DFS on Fri Jun 14 19:18:35 2024
    On 13/06/2024 16:38, DFS wrote:
    On 6/13/2024 9:21 AM, David Brown wrote:
    On 13/06/2024 00:34, DFS wrote:
    On 6/12/2024 6:22 PM, Keith Thompson wrote:
    Janis Papanagnou <[email protected]> writes:
    On 12.06.2024 22:47, DFS wrote:
    [...]
    before: char outliers[100];
    after : char outliers[100] = "";
    [...]
    Seriously; why do you expect [in C] a declaration to initialize that >>>>> stack object? (There are other languages that do initializations as
    the language defines it, but C doesn't; it may help to learn before
    programming in any language?) And why do you think that "" would be
    an appropriate initialization (i.e. a single '\0' character) and not >>>>> all 100 elements set to '\0'? (Someone else might want to access the >>>>> element 'answer[99]'.) And should we pay for initializing 1000000000 >>>>> characters in case one declares an appropriate huge array?

    This:
         char outliers[100] = "";
    initializes all 100 elements to zero.  So does this:
         char outliers[100] = { '\0' };
    Any elements or members not specified in an initializer are set to
    zero.

    Yes.  It's good to point that out, since people might assume that
    using a string literal here only initialises the bit covered by that
    string literal.

    (In C23 you can also write "char outliers[100] = {};" to get all zeros.)


    If you want to set an array's 0th element to 0 and not waste time
    initializing the rest, you can assign it separately:
         char outliers[100];
         outliers[0] = '\0';
    or
         char outliers[100];
         strcpy(outliers, "");
    though the overhead of the function call is likely to outweigh the
    cost of initializing the array.

    A good compiler will generate the same code for both cases - strcpy()
    is often inlined for such uses.


    Thanks.  I'll have to remember these things.  I like to use char arrays. >>>
    The problem is I don't use C very often, so I don't develop muscle
    memory.


    What programming language do you usually use?  And why are you writing
    in C instead?  (Or do you simply not do much programming?)

    I write a little code every few days.  Mostly python.

    Certainly if I wanted to calculate some statistics from small data sets,
    I'd go for Python - it would not consider C unless it was for an
    embedded system.


    I like C for it's blazing speed.  Very addicting.  And it's much more challenging/frustrating than python.

    With small data sets, Python has blazing speed - /every/ language has
    blazing speed. And for large data sets, use numpy on Python and you
    /still/ have blazing speeds - a lot faster than anything you would write
    in C (because numpy's underlying code is written in C by people who are
    much better at writing fast numeric code than you or I).

    The only reason to use C for something like is is for the challenge and
    fun, which is fair enough.


    I coded a subset (8 stat measures) of this C program 3.5 years ago, and recently decided to finish duplicating all 23 stats shown at:

    https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

    Working on the outliers code, I decided to add an option to generate
    data with consecutive numbers.  That's when I ran $./dfs 50 -c and
    noticed every value above 40 was considered an outlier.  And this didn't change over a bunch of code edits/file saves/compiles.

    Understanding how an uninitialized variable caused that persistent issue
    is beyond my pay grade.

    Understanding that you should not read from a variable that has never
    been given a value is well within the pay grade of every programmer.
    And it's something that every C programmer should understand. (And now
    you understand it too!)


    That's when I whined to clc.  Before I even posted, though, I spotted
    the uninitialized var (outliers).  Later I spotted another one (mode).

    One led to 'undefined behavior', the other to 'stack smashing'.  Both
    only occurred when using consecutive numbers.

    But with y'all's help I believe I found and fixed ALL issues.  I can
    dream anyway.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Malcolm McLean on Fri Jun 14 19:31:29 2024
    On 14/06/2024 19:06, Malcolm McLean wrote:
      Baby X FS - the filing system - code that allows you to create a
    virtual drive on your computer and access files from it using special fopen(), fclose() functions, but standard library functions like
    fprintf() or fgetc() for the other operations

    I think people don't get is why they should use this filing system, when
    they already have a perfectly good one within their OS on which fopen()
    etc already work.

    When you do fclose() after writing a file, will get it written to some persistent media?

    Because either it says in memory (dangerous if your machine crashes, or
    someone just turns it off), or it gets written to the same SSD/SD/HDD
    media that the real OS uses. In which case, what is the point?

    I gather this is not any of kind of OS with its own drivers for the
    peripherals on the machine, that takes over the real OS, or runs as some
    kind of virtual machine.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Jun 14 22:29:00 2024
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 12:44, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 00:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 13/06/2024 19:01, bart wrote:

    And here it just gets even uglier. You also get situations like this: >>>>>> ��� uint64_t i=0;
    ��� printf("%lld\n", i);
    This compiles OK with gcc -Wall, on Windows64. But compile under Linux64 >>>>>> and it complains the format should be %ld. Change it to %ld, and it >>>>>> complains under Windows.
    It can't tell you that you should be using one of those ludicrous macros.
    I've also just noticed that 'i' is unsigned but the format calls for >>>>>> signed. That may or may not be deliberate, but the compiler didn't say >>>>>> anything.

    Exactly. We can't have this just to print out an integer.
    This is how C works. There's no point in moaning about it. Use another >>>> language or do what you have to in C.

    In Baby X I provide a function called bbx_malloc(). It's is guaranteed >>>>> never to return null. Currently it just calls exit() on allocation failure.
    But it also limits allocation to slightly under INT_MAX. Which should be >>>>> plenty for a Baby program, and if you want more, you always have big boy's
    malloc.
    And if you need to change the size?

    But at a stroke, that gets rid of any need for size_t,
    But sizeof, strlen (and friends like the mbs... and wcs... functions), >>>> strspn (and friend), strftime, fread, fwrite. etc. etc. all return
    size_t.

    But these are not Baby X functions.
    Neither is malloc but you wanted t replace that to get rid of the need
    for size_t.
    I confess that I am all at sea about what you are doing. In essence, I
    don't understand the rules of the game so I should probably just stop
    commenting.

    Yes, I really need to get that website together so that people cotton on to what Baby X is, what it can and cannot do, and what is the point.

    I know what Baby X is. I don't know why "these are not Baby X
    functions" applies to the ones I listed and not to malloc.

    ...
    However if you need to pass a colour value to a fuction, you normall pass a BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you query the channels using the macros in bbx_color.h

    #ifndef bbx_color_h
    #define bbx_color_h

    typedef unsigned long BBX_RGBA;


    Curious. The macros below seem to assume that int is 32 bits, so why
    use long?

    #define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) << 8) | (a) ))

    This is likely to involve undefined behaviour when r >= 128. (I presume
    you are ruling out int narrower than 32 bits or there are other problems
    as well.)

    #define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
    #define bbx_red(col) ((col >> 24) & 0xFF)
    #define bbx_green(col) ((col >> 16) & 0xFF)
    #define bbx_blue(col) ((col >> 8) & 0xFF)
    #define bbx_alpha(col) (col & 0xFF)

    It might not be an issue (as col is opaque and unlikely to be an
    expression) but I'd still write (col) here to stop the reader having to
    check or reason that out.

    #define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

    #endif

    The last macro is to make it easier to interface with Xlib, and has the prefix BBX_ (upper case) indicating that it is for internal use by the bbx library / system and not meant for user programs.

    As a reader of the code, I made exactly the reverse assumption. When I
    see lower-case macros I assume they are for internal use.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Fri Jun 14 22:32:12 2024
    "Chris M. Thomasson" <[email protected]> writes:

    Fwiw, I remember doing a channel based hit map that stored an image using RGBA but used floats. Each pixel would have a hit:

    struct hit
    {
    float m_color[4];
    };

    It would take all of the hits and depending on what was going on during iteration it would increment parts of hit::m_color[4].

    Not in C you didn't!

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to David Brown on Fri Jun 14 19:05:49 2024
    On 6/14/2024 1:18 PM, David Brown wrote:
    On 13/06/2024 16:38, DFS wrote:

    I write a little code every few days.  Mostly python.

    Certainly if I wanted to calculate some statistics from small data sets,
    I'd go for Python - it would not consider C unless it was for an
    embedded system.


    I like C for it's blazing speed.  Very addicting.  And it's much more
    challenging/frustrating than python.

    With small data sets, Python has blazing speed - /every/ language has
    blazing speed.  And for large data sets, use numpy on Python and you
    /still/ have blazing speeds - a lot faster than anything you would write
    in C (because numpy's underlying code is written in C by people who are
    much better at writing fast numeric code than you or I).

    The only reason to use C for something like is is for the challenge and
    fun, which is fair enough.


    It was fun, especially when I got every stat to match the website exactly.


    I just now ported that C stats program to python. The original C took
    me ~2.5 days to write and test.

    The port to python then took about 2 hours.

    It mainly consisted of replacing printf with print, removing brackets
    {}, changing vars max and min to dmax and dmin, dropping the \n from
    printf's, replacing fabs() with abs(), etc.

    Line count dropped about 20%.


    During conversion, I got a Python error I don't remember seeing in the past:

    "TypeError: list indices must be integers or slices, not float"

    because division returns a float, and some of the array addressing was
    like this: nums[i/2].

    My initial fix was this clunk (convert to int()):

    # median and quartiles
    # quartiles divide sorted dataset into four sections
    # Q1 = median of values less than Q2
    # Q2 = median of the data set
    # Q3 = median of values greater than Q2
    if N % 2 == 0:
    Q2 = median = (nums[int((N/2)-1)] + nums[int(N/2)]) / 2.0
    i = int(N/2)
    if i % 2 == 0:
    Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0
    Q3 = (nums[int(i + ((i-1)/2))] + nums[int(i+(i/2))]) / 2.0
    else:
    Q1 = nums[int((i-1)/2)]
    Q3 = nums[int(i + ((i-1)/2))]

    if N % 2 != 0:
    Q2 = median = nums[int((N-1)/2)]
    i = int((N-1)/2)
    if i % 2 == 0:
    Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0
    Q3 = (nums[int(i + (i/2))] + nums[int(i + (i/2) + 1)]) / 2.0
    else:
    Q1 = nums[int((i-1)/2)]
    Q3 = nums[int(i + ((i+1)/2))]


    And then with some substitution:

    if N % 2 == 0:
    i = int(N/2)
    Q2 = median = (nums[i - 1] + nums[i]) / 2.0
    x = int(i/2)
    y = int((i-1)/2)
    if i % 2 == 0:
    Q1 = (nums[x - 1] + nums[x]) / 2.0
    Q3 = (nums[i + y] + nums[i + x]) / 2.0
    else:
    Q1 = nums[y]
    Q3 = nums[i + y]

    if N % 2 != 0:
    i = int((N-1)/2)
    Q2 = median = nums[i]
    x = int(i/2)
    y = int((i-1)/2)
    z = int((i+1)/2)
    if i % 2 == 0:
    Q1 = (nums[x - 1] + nums[x]) / 2.0
    Q3 = (nums[i + x] + nums[i + x + 1]) / 2.0
    else:
    Q1 = nums[y]
    Q3 = nums[i + z]


    How would you do it?


    If you have an easy to apply formula for computing the quartiles, let's
    hear it!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Sat Jun 15 00:14:22 2024
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 22:29, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 12:44, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 00:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 13/06/2024 19:01, bart wrote:

    And here it just gets even uglier. You also get situations like this: >>>>>>>> ��� uint64_t i=0;
    ��� printf("%lld\n", i);
    This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
    and it complains the format should be %ld. Change it to %ld, and it >>>>>>>> complains under Windows.
    It can't tell you that you should be using one of those
    ludicrous macros.
    I've also just noticed that 'i' is unsigned but the format calls for >>>>>>>> signed. That may or may not be deliberate, but the compiler didn't say >>>>>>>> anything.

    Exactly. We can't have this just to print out an integer.
    This is how C works. There's no point in moaning about it. Use another >>>>>> language or do what you have to in C.

    In Baby X I provide a function called bbx_malloc(). It's is guaranteed >>>>>>> never to return null. Currently it just calls exit() on
    allocation failure.
    But it also limits allocation to slightly under INT_MAX. Which should be
    plenty for a Baby program, and if you want more, you always
    have big boy's
    malloc.
    And if you need to change the size?

    But at a stroke, that gets rid of any need for size_t,
    But sizeof, strlen (and friends like the mbs... and wcs... functions), >>>>>> strspn (and friend), strftime, fread, fwrite. etc. etc. all return >>>>>> size_t.

    But these are not Baby X functions.
    Neither is malloc but you wanted t replace that to get rid of the need >>>> for size_t.
    I confess that I am all at sea about what you are doing. In essence, I >>>> don't understand the rules of the game so I should probably just stop
    commenting.

    Yes, I really need to get that website together so that people cotton on to >>> what Baby X is, what it can and cannot do, and what is the point.
    I know what Baby X is. I don't know why "these are not Baby X
    functions" applies to the ones I listed and not to malloc.
    ...
    However if you need to pass a colour value to a fuction, you normall pass a >>> BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you >>> query the channels using the macros in bbx_color.h

    #ifndef bbx_color_h
    #define bbx_color_h

    typedef unsigned long BBX_RGBA;

    Curious. The macros below seem to assume that int is 32 bits, so why
    use long?

    Why use long?

    #define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) << >>> 8) | (a) ))
    This is likely to involve undefined behaviour when r >= 128. (I presume
    you are ruling out int narrower than 32 bits or there are other problems
    as well.)

    No, it's been miswritten. Which is what I mean about C's integer types
    being a source of bugs. That code does not look buggy, but it is.

    I have no idea what this means. You start with "no" but I can't work
    out what you think is wrong about what I said. And what does "has been miswritten" mean? Both the tense and the use of "miswritten" are
    confusing to me. And, to me, the code does look "buggy".

    #define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
    #define bbx_red(col) ((col >> 24) & 0xFF)
    #define bbx_green(col) ((col >> 16) & 0xFF)
    #define bbx_blue(col) ((col >> 8) & 0xFF)
    #define bbx_alpha(col) (col & 0xFF)
    It might not be an issue (as col is opaque and unlikely to be an
    expression) but I'd still write (col) here to stop the reader having to
    check or reason that out.

    #define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

    #endif

    The last macro is to make it easier to interface with Xlib, and has the
    prefix BBX_ (upper case) indicating that it is for internal use by the bbx >>> library / system and not meant for user programs.

    As a reader of the code, I made exactly the reverse assumption. When I
    see lower-case macros I assume they are for internal use.

    They're function-like macros. Iterating over an rgba buffer is very processor-intensive, and so we do haave to compromise sfatety for speed
    here.

    I am not suggesting otherwise.

    All function-like symbols bbx_ are provided by Baby X for users, all
    symbols BBX_ have that prefix to reduce the chance of collisions with other code.

    Clearly. I'm not sure why you have reiterated this. I did not intend
    to change your mind, just to point out that it's the reverse of the
    common convention.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Keith Thompson on Fri Jun 14 23:49:37 2024
    On 6/14/2024 9:39 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    [...]
    During conversion, I got a Python error I don't remember seeing in the past: >>
    "TypeError: list indices must be integers or slices, not float"

    because division returns a float, and some of the array addressing was
    like this: nums[i/2].
    [...]

    C's "/" operator yields a result with the type of the operands (after promotion to a common type).

    Python's "/" operator yields a floating-point result. For C-style
    integer division, Python uses "//". (Python 2 is more C-like.)


    I was surprised python did that, since every division used in the array addressing results in an integer.

    After casting i to an int before any array addressing, // works.

    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Keith Thompson on Sat Jun 15 00:45:24 2024
    On 6/14/2024 11:56 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    On 6/14/2024 9:39 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    [...]
    During conversion, I got a Python error I don't remember seeing in the past:

    "TypeError: list indices must be integers or slices, not float"

    because division returns a float, and some of the array addressing was >>>> like this: nums[i/2].
    [...]
    C's "/" operator yields a result with the type of the operands
    (after
    promotion to a common type).
    Python's "/" operator yields a floating-point result. For C-style
    integer division, Python uses "//". (Python 2 is more C-like.)

    I was surprised python did that, since every division used in the
    array addressing results in an integer.

    After casting i to an int before any array addressing, // works.

    I'm surprised you needed to convert i to an int. I would think that
    just replacing nums[i/2] by nums[i//2] would do the trick,
    as long> as i always has an int value (note Python's dynamic typing).
    If i
    is acquiring a float value, that's probably a bug, given the name.

    I spotted the issue. Just prior to using i for array addressing I said:
    i = N/2.

    The fix is set i = int(N/2)



    But if you want help with your Python code, comp.lang.python is the
    place to ask.

    Thanks for your help, but David Brown is a Python developer and I'll ask
    him python questions here whenever I care to.

    In the recent past you were involved in discussions on perl, Fortran and
    awk, among other off-topics.

    Rules for thee but not for me?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to DFS on Sat Jun 15 07:03:16 2024
    On 15.06.2024 06:45, DFS wrote:
    On 6/14/2024 11:56 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:

    After casting i to an int before any array addressing, // works.

    I'm surprised you needed to convert i to an int. I would think that
    just replacing nums[i/2] by nums[i//2] would do the trick,
    as long as i always has an int value (note Python's dynamic typing).
    If i is acquiring a float value, that's probably a bug, given the name.

    I spotted the issue. Just prior to using i for array addressing I said:
    i = N/2.

    The fix is set i = int(N/2)

    Given what Keith suggested, and assuming N is an integer, wouldn't it
    be more sensible to use the int division operator '//' and just write
    i = N // 2 ? I mean, why do a float division on integer operands and
    then again coerce the result to int again?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to DFS on Sat Jun 15 01:05:16 2024
    On 6/15/24 00:45, DFS wrote:
    On 6/14/2024 11:56 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    On 6/14/2024 9:39 PM, Keith Thompson wrote:
    ...
    I'm surprised you needed to convert i to an int. I would think that
    just replacing nums[i/2] by nums[i//2] would do the trick,
    as long> as i always has an int value (note Python's dynamic typing).
    If i
    is acquiring a float value, that's probably a bug, given the name.

    I spotted the issue. Just prior to using i for array addressing I said:
    i = N/2.

    The fix is set i = int(N/2)

    Alternatively, i = N//2

    But if you want help with your Python code, comp.lang.python is the
    place to ask.

    Thanks for your help, but David Brown is a Python developer and I'll ask
    him python questions here whenever I care to.

    Keep in mind that he's just one Python developer. With all due respect
    to David, you're likely to get better answers to your Python questions
    by going to a Python forum filled with Python developers.
    It's not about "following the rules" - rules are meaningless when
    enforcement is impossible, as it is in an unmoderated newsgroup like
    this one. It's about getting the best possible answer to your questions.
    If you prefer get lower quality answers to your Python questions,
    continue asking them in forums where they are off-topic - but why would
    you prefer that?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to DFS on Sat Jun 15 09:37:06 2024
    On 15/06/2024 05:45, DFS wrote:
    On 6/14/2024 11:56 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    On 6/14/2024 9:39 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    [...]
    During conversion, I got a Python error I don't remember seeing in
    the past:

    "TypeError: list indices must be integers or slices, not float"

    because division returns a float, and some of the array addressing was >>>>> like this: nums[i/2].
    [...]
    C's "/" operator yields a result with the type of the operands
    (after
    promotion to a common type).
    Python's "/" operator yields a floating-point result.  For C-style
    integer division, Python uses "//".  (Python 2 is more C-like.)

    I was surprised python did that, since every division used in the
    array addressing results in an integer.

    After casting i to an int before any array addressing, // works.

    I'm surprised you needed to convert i to an int.  I would think that
    just replacing nums[i/2] by nums[i//2] would do the trick,
    as long> as i always has an int value (note Python's dynamic typing).
     If i
    is acquiring a float value, that's probably a bug, given the name.

    I spotted the issue.  Just prior to using i for array addressing I said:
    i = N/2.

    The fix is set i = int(N/2)



    But if you want help with your Python code, comp.lang.python is the
    place to ask.

    Thanks for your help, but David Brown is a Python developer and I'll ask
    him python questions here whenever I care to.

    Yeah do that. Set up a private corner of comp.lang.c where David Brown
    has a sideline answering questions about Python from only one poster.

    Nobody else is allowed to answer.

    Sounds ridiculous, yes?


    In the recent past you were involved in discussions on perl, Fortran and
    awk, among other off-topics.

    Rules for thee but not for me?


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Janis Papanagnou on Sat Jun 15 07:39:40 2024
    On 6/15/2024 1:03 AM, Janis Papanagnou wrote:
    On 15.06.2024 06:45, DFS wrote:
    On 6/14/2024 11:56 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:

    After casting i to an int before any array addressing, // works.

    I'm surprised you needed to convert i to an int. I would think that
    just replacing nums[i/2] by nums[i//2] would do the trick,
    as long as i always has an int value (note Python's dynamic typing).
    If i is acquiring a float value, that's probably a bug, given the name.

    I spotted the issue. Just prior to using i for array addressing I said:
    i = N/2.

    The fix is set i = int(N/2)

    Given what Keith suggested, and assuming N is an integer, wouldn't it
    be more sensible to use the int division operator '//' and just write
    i = N // 2 ? I mean, why do a float division on integer operands and
    then again coerce the result to int again?

    Python bytecode
    $ python3 -m dis file.py

    i = N//2
    1068 LOAD_NAME 12 (N)
    1070 LOAD_CONST 10 (2)
    1072 BINARY_FLOOR_DIVIDE
    1074 STORE_NAME 10 (i)

    i = int(N/2)
    1068 LOAD_NAME 11 (int)
    1070 LOAD_NAME 12 (N)
    1072 LOAD_CONST 10 (2)
    1074 BINARY_TRUE_DIVIDE
    1076 CALL_FUNCTION 1
    1078 STORE_NAME 10 (i)

    Fewer ops is better, so I'll go with your suggestion. Good catch.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Sat Jun 15 20:57:49 2024
    On 15/06/2024 00:35, Malcolm McLean wrote:
    On 14/06/2024 22:29, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 12:44, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 14/06/2024 00:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 13/06/2024 19:01, bart wrote:

    And here it just gets even uglier. You also get situations like >>>>>>>> this:
            uint64_t i=0;
            printf("%lld\n", i);
    This compiles OK with gcc -Wall, on Windows64. But compile under >>>>>>>> Linux64
    and it complains the format should be %ld. Change it to %ld, and it >>>>>>>> complains under Windows.
    It can't tell you that you should be using one of those
    ludicrous macros.
    I've also just noticed that 'i' is unsigned but the format calls >>>>>>>> for
    signed. That may or may not be deliberate, but the compiler
    didn't say
    anything.

    Exactly. We can't have this just to print out an integer.
    This is how C works.  There's no point in moaning about it.  Use >>>>>> another
    language or do what you have to in C.

    In Baby X I provide a function called bbx_malloc(). It's is
    guaranteed
    never to return null. Currently it just calls exit() on
    allocation failure.
    But it also limits allocation to slightly under INT_MAX. Which
    should be
    plenty for a Baby program, and if you want more, you always have >>>>>>> big boy's
    malloc.
    And if you need to change the size?

    But at a stroke, that gets rid of any need for size_t,
    But sizeof, strlen (and friends like the mbs... and wcs...
    functions),
    strspn (and friend), strftime, fread, fwrite. etc. etc. all return >>>>>> size_t.

    But these are not Baby X functions.
    Neither is malloc but you wanted t replace that to get rid of the need >>>> for size_t.
    I confess that I am all at sea about what you are doing.  In essence, I >>>> don't understand the rules of the game so I should probably just stop
    commenting.

    Yes, I really need to get that website together so that people cotton
    on to
    what Baby X is, what it can and cannot do, and what is the point.

    I know what Baby X is.  I don't know why "these are not Baby X
    functions" applies to the ones I listed and not to malloc.

    ...
    However if you need to pass a colour value to a fuction, you normall
    pass a
    BBX_RGBA value, which is typedefed to unsigned long, and is opaque,
    and you
    query the channels using the macros in bbx_color.h

    #ifndef bbx_color_h
    #define bbx_color_h

    typedef unsigned long BBX_RGBA;


    Curious.  The macros below seem to assume that int is 32 bits, so why
    use long?

    #define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) |
    ((b) <<
    8) | (a) ))

    This is likely to involve undefined behaviour when r >= 128.  (I presume
    you are ruling out int narrower than 32 bits or there are other problems
    as well.)


    No, it's been miswritten. Which is what I mean about C's integer types
    being a source of bugs. That code does not look buggy, but it is.

    #define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
    #define bbx_red(col) ((col >> 24) & 0xFF)
    #define bbx_green(col) ((col >> 16) & 0xFF)
    #define bbx_blue(col) ((col >> 8) & 0xFF)
    #define bbx_alpha(col) (col & 0xFF)

    It might not be an issue (as col is opaque and unlikely to be an
    expression) but I'd still write (col) here to stop the reader having to
    check or reason that out.

    #define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

    #endif

    The last macro is to make it easier to interface with Xlib, and has the
    prefix BBX_ (upper case) indicating that it is for internal use by
    the bbx
    library / system and not meant for user programs.

    As a reader of the code, I made exactly the reverse assumption.  When I
    see lower-case macros I assume they are for internal use.



    They're function-like macros. Iterating over an rgba buffer is very processor-intensive, and so we do haave to compromise sfatety for speed
    here. All function-like symbols bbx_ are provided by Baby X for users,
    all symbols BBX_ have that prefix to reduce the chance of collisions
    with other code.


    In this little exchange, there have been several points where your code
    is unclear, inefficient, non-portable or downright buggy, purely due to
    your insistence in using an outdated version of C.

    If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer, write:

    typedef uint32_t BBX_RGBA;

    If you want bbx_rgba() to be a function that is typesafe, correct, and efficient (for any decent compiler), write :

    static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
    uint32_t b, uint32_t a)
    {
    return (r << 24) | (g << 16) | (b << 8) | a;
    }

    If you want your colour types to be "opaque", as you claimed, make it a
    struct with inline accessor functions.

    Use static inline functions instead of function-like macros and you
    don't need the extra parenthesis round things (and you don't need to
    justify to readers why they are not there). You can use small letter
    names without running contrary to common conventions.

    Your insistence on hobbling your choice of language shows through in the
    poor quality of the code - or at least, the missed opportunities to make
    the code better and safer for both you and your users.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to David Brown on Sat Jun 15 20:27:25 2024
    On 15/06/2024 19:57, David Brown wrote:

    If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer, write:

        typedef uint32_t BBX_RGBA;

    If you want bbx_rgba() to be a function that is typesafe, correct, and efficient (for any decent compiler), write :

        static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
                uint32_t b, uint32_t a)
        {
            return (r << 24) | (g << 16) | (b << 8) | a;
        }


    Shouldn't that be ... ?

    static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
    uint8_t b, uint8_t a)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Scott Lurndal on Sat Jun 15 22:15:09 2024
    On 14/06/2024 19:36, Scott Lurndal wrote:
    David Brown <[email protected]> writes:
    On 13/06/2024 16:38, DFS wrote:

    What programming language do you usually use?  And why are you writing >>>> in C instead?  (Or do you simply not do much programming?)

    I write a little code every few days.  Mostly python.

    Certainly if I wanted to calculate some statistics from small data sets,
    I'd go for Python - it would not consider C unless it was for an
    embedded system.

    I'd likely turn to R instead of Python for that.


    The only thing I know about R is that it would be a good choice for
    statistics code if I knew R. Since I don't know anything more about R,
    I'd go for Python :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sat Jun 15 22:13:24 2024
    On 14/06/2024 21:34, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 14/06/2024 00:58, Keith Thompson wrote:
    bart <[email protected]> writes:
    On 13/06/2024 16:39, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 13/06/2024 01:33, Keith Thompson wrote:
    If foo is an int, for example, printf lets you decide how to print
    it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
    in C23), upper vs. lower case for hex). Perhaps "print foo" in
    your language has similar features.

    C23 also adds explicit width length modifiers. So instead of having
    to guess if uint64_t is "%llu" or "%lu" on a particular platform, or
    using the PRIu64 macro, you can now use "%w64u" for uint64_t (or
    uint_least64_t if the exact width type does not exist). I think
    that's about as neat as you could get, within the framework of printf.

    Note that the new "%wN" modifier applies only to [u]intN_t and [u]int_leastN_t types, not to all integer types with a width of N bits.


    Yes, but by the definition of these types, if you uintN_t exists then uint_leastN_t is the same type. (This is a new detail in C23.)

    The standard doesn't guarantee that integer types with the same representation are interchangeable, so for example printf("%d", 0L) and printf("%ld", 0) both have undefined behavior. An implementation would probably have to go out of its way to make either of those do anything
    other than printing "0", but the behavior is still undefined (i.e., the standard doesn't guarantee it will work).

    True. For C17, uint32_t and uint_least32_t could be different and
    incompatible types. It's highly unlikely, but possible. That was fixed
    in C23. (7.22.1.1p3)


    That's still the case in C23, even for the %wN modifiers. For a typical implementation with 32-bit integer types, uint32_t and uint_least32_t
    will be the same type (C17 doesn't require that), and "%w32u" will work
    with that type. It's not guaranteed to work with any other 32-bit
    unsigned type. For an implementation that doesn't have any 32-bit
    integer type, uint32_t won't exist, uint_least32_t will be, say, 64
    bits, and "%w32u" will work with *that* type.

    Yes, that is correct.

    It can be surprising for some people to hear that types with identical
    size and characteristics can still be incompatible. But at least with
    C23, we don't have to worry about that for the uintN_t and uint_leastN_t
    types. (The same applies to the signed versions.)

    If you want to use the bit width length modifiers in C23 printf, you
    might still have to cast your "int" or "long" data to an appropriate
    intN_t or int_leastN_t.


    That covers the exact-width and "least" types. The "%wfN" modifiers
    cover the "fast" types.

    So if you want to use C23's new "%wN" modifiers, you have to use the
    types defined in <stdint.h> if you want to avoid undefined behavior.

    Yes. But if you want particular sizes for your types, that's a good
    idea anyway.

    On the other hand, though `int n = 42; printf("%w32d\n", n);` has
    undefined behavior, it's very very likely to work if int is 32 bits.
    (`gcc -Wformat` warns about using "%ld" with a long long argument
    even when long and long long have the same size, but not about using
    "%w32d" with a 32-bit int argument.)

    The new modifiers are supported in glibc 2.39, which is included in
    Ubuntu 24.04. They're not supported in newlib (used by Cygwin) or in
    MS Visual Studio 2022.

    [...]

    I wouldn't mind seeing a new kind of typedef that creates a new type
    rather than an alias. Then uint64_t could be a distinct type.
    That could cause some problems for _Generic, for example.

    I too would like such a typedef. Using it for uint64_t would cause
    problems for /existing/ uses of _Generic, but would make future uses
    better.

    Currently, there are (in the absence of extended integer types) only a
    finite number of incompatible integer types. This makes it possible to
    write a _Generic expression that accepts an operand of any integer type, which can be useful if you have an integer typedef and don't know the underlying type. This new kind of typedef would allow programmers to introduce an unlimited number of new incompatible integer types.


    Yes. But it would also allow you to make a "strong typedef" for a
    particular use and have a _Generic that distinguishes it. I believe I
    would find that more useful than the disadvantage you describe.
    (Perhaps it would be even better if it were possible to extend
    _Generic's, rather than cover all the types in one go.)

    I haven't seen a lot of code that does that kind of thing, and none
    that I didn't write myself.

    Perhaps if this is introduced, there should be a way to determine the underlying type. C23 introduces typeof and typeof_unqual; perhaps we
    could have typeof_underlying. It could also apply to enum types.


    Interesting idea.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to DFS on Sat Jun 15 22:22:09 2024
    On 15/06/2024 06:45, DFS wrote:
    On 6/14/2024 11:56 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    On 6/14/2024 9:39 PM, Keith Thompson wrote:
    DFS <[email protected]> writes:
    [...]
    During conversion, I got a Python error I don't remember seeing in
    the past:

    "TypeError: list indices must be integers or slices, not float"

    because division returns a float, and some of the array addressing was >>>>> like this: nums[i/2].
    [...]
    C's "/" operator yields a result with the type of the operands
    (after
    promotion to a common type).
    Python's "/" operator yields a floating-point result.  For C-style
    integer division, Python uses "//".  (Python 2 is more C-like.)

    I was surprised python did that, since every division used in the
    array addressing results in an integer.

    After casting i to an int before any array addressing, // works.

    I'm surprised you needed to convert i to an int.  I would think that
    just replacing nums[i/2] by nums[i//2] would do the trick,
    as long> as i always has an int value (note Python's dynamic typing).
     If i
    is acquiring a float value, that's probably a bug, given the name.

    I spotted the issue.  Just prior to using i for array addressing I said:
    i = N/2.

    The fix is set i = int(N/2)



    But if you want help with your Python code, comp.lang.python is the
    place to ask.

    Thanks for your help, but David Brown is a Python developer and I'll ask
    him python questions here whenever I care to.


    I consider myself more of a C developer than a Python developer, but I
    use Python regularly. I would say that my knowledge of the C language
    and standard, while not as deep as some others here, covers a far higher proportion of the language than my knowledge of Python covers of Python.
    But I think you can make good use of Python while knowing a smaller
    fraction of the language and library than for C.

    In the recent past you were involved in discussions on perl, Fortran and
    awk, among other off-topics.

    Rules for thee but not for me?


    If occasional questions or discussions about other languages pop up
    here, people will often answer them. But for more in-depth discussions
    or questions, this is not the newsgroup - comp.lang.python is the place
    for Python questions. (You'll also probably get better answers there
    than I can give.)

    The rules are for everyone, but they are a bit fuzzy. (And different
    posters have different levels of fuzziness.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Richard Harnden on Sat Jun 15 23:13:01 2024
    Richard Harnden <[email protected]d> writes:

    On 15/06/2024 19:57, David Brown wrote:
    If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer,
    write:
    ����typedef uint32_t BBX_RGBA;
    If you want bbx_rgba() to be a function that is typesafe, correct, and
    efficient (for any decent compiler), write :
    ����static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
    ����������� uint32_t b, uint32_t a)
    ����{
    ������� return (r << 24) | (g << 16) | (b << 8) | a;
    ����}


    Shouldn't that be ... ?

    static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
    uint8_t b, uint8_t a)

    Maybe, but the function then needs more care as uint8_t will promote to
    int and then r << 24 can be undefined. One needs

    ((BBX_RGBA)r << 24) | (g << 16) | (b << 8) | a

    (assuming that int is never going to be 16 bits or the same issue comes
    up with the g << 16 shift). Given this assumption, I'd just check that unsigned int is at least 32 bits are use that for BBX_RGBA.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Richard Harnden on Sun Jun 16 12:53:38 2024
    On 15/06/2024 21:27, Richard Harnden wrote:
    On 15/06/2024 19:57, David Brown wrote:

    If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer,
    write:

         typedef uint32_t BBX_RGBA;

    If you want bbx_rgba() to be a function that is typesafe, correct, and
    efficient (for any decent compiler), write :

         static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
                 uint32_t b, uint32_t a)
         {
             return (r << 24) | (g << 16) | (b << 8) | a;
         }


    Shouldn't that be ... ?

    static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
            uint8_t b, uint8_t a)


    As Ben says, that will not work on its own - "r" would get promoted to
    signed int before the shift, and we are back to undefined behaviour.

    I think there is plenty of scope for improvement in a variety of ways, depending on what the author is looking for. For example, uint8_t might
    not exist on all platforms (indeed there are current processors that
    don't support it, not just dinosaur devices). But any system that
    supports a general-purpose gui, such as Windows or *nix systems, will
    have these types and will also have a 32-bit int. So the code author
    can balance portability with convenient assumptions.

    There are also balances to be found between run-time checking and
    efficiency, and how to handle bad data. If the function can assume that
    no one calls it with values outside 0..255, or that it doesn't matter
    what happens if such values are used, then you don't need any checks.
    As it stands, with uint32_t parameters, out-of-range values will lead to
    fully defined but wrong results. Switching to "uint8_t" types would
    give a different fully defined but wrong result. Maybe the function
    should use saturation, or run-time checks and error messages - that will
    depend on where it is in the API, what the code author wants, and what
    users expect.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Tue Jun 18 17:23:09 2024
    Keith Thompson <[email protected]> writes:

    (I'd like to a future standard require plain char to be unsigned,
    but I don't know how likely that is.)

    It seems unnecessary given that the upcoming C standard
    is choosing to mandate two's complement for all signed
    integer types.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Sat Jun 22 09:28:14 2024
    Keith Thompson <[email protected]> writes:

    Tim Rentsch <[email protected]> writes:

    Keith Thompson <[email protected]> writes:

    (I'd like to a future standard require plain char to be unsigned,
    but I don't know how likely that is.)

    It seems unnecessary given that the upcoming C standard
    is choosing to mandate two's complement for all signed
    integer types.

    It's less necessary, but I'd still like to see it.

    These days, strings very commonly hold UTF-8 data. The fact that bytes
    whose values exceed 127 are negative is conceptually awkward, even if everything happens to work. It rarely if ever makes sense to treat a character value as negative.

    The combination of mandating two's complement and using a compiler
    option like -funsigned-char (supported by both gcc and clang)
    should be enough to do what you want.

    (And of course signed char still exists,
    or int8_t if you prefer 8 bits vs. CHAR_BIT bits.)

    It makes me laugh when people use int8_t instead of signed char.
    If CHAR_BIT isn't 8 then there won't be any int8_t. And of
    course we can always throw in a static assertion if it is felt
    necessary to protect against implementations that don't have
    8-bit chars. (A static assertion also can verify that two's
    complement is being used for signed char.)

    A drawback is that it could break existing (non-portable) code that
    assumes plain char is signed.

    Exactly! No reason to break the whole world when you can get
    what you want just by using a compiler option.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)