• filling area by color atack safety

    From fir@21:1/5 to All on Sat Mar 16 05:11:44 2024
    i was writing simple editor (something like paint but more custom for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to do
    it by hand pixel by pixel and the need to change color of given element
    is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on adjacent
    4 pixels (only need check if it is in screen at all and if the color to
    change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x, y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
    }

    return 0;
    }

    it work but im not quite sure how to estimate the safety of this - incidentally as i said i use this editor to low res graphics like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but still

    is there maybe simple way to improve it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Sat Mar 16 13:55:03 2024
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.

    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Ben Bacarisse on Sat Mar 16 14:41:50 2024
    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.

    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    I personally find recursion hard work and errors much harder to debug.
    It is also becomes much more important to show that will not cause stack overflow.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Sat Mar 16 15:40:09 2024
    On 16/03/2024 12:33, Malcolm McLean wrote:
    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom for
    my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to
    do it  by hand pixel by pixel and the need to change color of given
    element is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on
    adjacent 4 pixels (only need check if it is in screen at all and if
    the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
       if(old_color == new_color) return 0;

       if(XYIsInScreen( x,  y))
       if(GetPixelUnsafe(x,y)==old_color)
       {
         SetPixelSafe(x,y,new_color);
         RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color);
         RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color);
         RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color);
         RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color);
         return 1;
       }

       return 0;
    }

    it work but im not quite sure how to estimate the safety of this  -
    incidentally as i said i use this editor to low res graphics  like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but still

    is there maybe simple way to improve it?

    This is a cheap and cheerful fllod fill. And it's easy to get right and shouldn't afall over. But but makes an awful not of unnecessary calls,
    and on a small system and large image might even blow the stack.

    It is not going to lead to stack overflow on any reasonable system. If
    the image size is 200 x 200, as the OP said, it will never reach a depth
    of more than 400 calls (the maximum path length before back-tracking is inevitable). Even for big images, I can't see it being a problem. I
    remember using the same method on a 16K ZX Spectrum as a teenager.


    Recursion make programs harder to reason about and prove correct.

    As a general statement, that is simply wrong. It is no coincidence that
    most provably correct software development is done using functional
    programming languages, which are based entirely on recursion. Recursion
    maps well to inductive proofs, and avoids variables, and is thus often
    much easier to work with for proving code correct.


    So a real flood fill doesn't work like that. You use a queue and put the pixels to be filled into that, and trace lines.


    That might be a bit more efficient, but not significantly so (at least,
    not in your implementation below). You are using a queue instead of the
    stack, but it will grow in exactly the same manner.

    And here's some code I wrote a while ago. Use that as a pattern. But not
    sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove correct
    than the OP's original one, and unlikely to be very much faster (it will certainly scale in the same way in both time and memory usage).

    There are a variety of different flood-fill algorithms, with different advantages and disadvantages. Speeds will often depend as much on the
    way the get/set pixel code works, especially if the flood-fill is on
    live displayed data rather than in a buffer off-screen. But typically
    you need to get a /lot/ more advanced (i.e., not your algorithm) to
    improve on the OP's version by an order of magnitude, so if speed is not essential but understanding that it is correct is important, then it
    makes more sense to stick to the original recursive version.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to fir on Sat Mar 16 19:13:32 2024
    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to do
    it  by hand pixel by pixel and the need to change color of given element
    is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on adjacent
    4 pixels (only need check if it is in screen at all and if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color);
        return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this  -

    On my machine, it's OK up to a 400x400 image (starting with all one
    colour and filling from the centre with another colour).

    At 500x500, I get stack overflow. The 400x400 the maximum recursion
    depth is 80,000 calls.

    I don't an alternative ATM, I'm just reporting what I saw with my test
    program shown below, since some here don't believe that recursion can be problematical.



    --------------------------
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    typedef unsigned char byte;

    enum {dimx=400};
    enum {dimy=dimx};
    byte image[dimx][dimy];
    int maxdepth;

    byte getpixel(int x, int y) {
    return image[x][y];
    }

    void setpixel(int x, int y, byte newcol) {
    image[x][y]=newcol;
    }

    int onscreen(int x, int y) {
    return x>=0 && x<dimx && y>=0 && y<dimy;
    }

    void fill(int x, int y, unsigned old_color, unsigned new_color)
    {
    if(old_color == new_color) return;
    static int depth=0;

    ++depth;
    if (depth>maxdepth) maxdepth=depth;

    if(onscreen( x, y)) {
    //printf("FILL %d %d %d depth:%d\n",x,y, onscreen(x,y), depth);
    if(getpixel(x,y)==old_color)
    {
    setpixel(x,y,new_color);
    fill(x+1, y, old_color, new_color);
    fill(x-1, y, old_color, new_color);
    fill(x, y-1, old_color, new_color);
    fill(x, y+1, old_color, new_color);
    --depth;
    return;
    }
    }
    --depth;
    return;
    }

    static void writepgm(byte* file) {
    int x, y;
    void* f;
    f = fopen(file,"w");
    fprintf(f,"%s\n","P2");
    fprintf(f,"%d %d\n",dimx,dimy);
    fprintf(f,"255\n");
    for (y=0; y<dimy; ++y) {
    for (x=0; x<dimx; ++x) {
    fprintf(f,"%u%s",image[y][x]," ");
    }
    fprintf(f,"\n");
    }
    fclose(f);
    }

    int main(void) {

    fill(dimx/2, dimy/2, 0, 80);

    printf("maxdepth=%d\n",maxdepth);
    puts("");

    puts("Writing test.ppm:");
    writepgm("test.ppm");

    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Sat Mar 16 18:25:37 2024
    David Brown <[email protected]> writes:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern. But not
    sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c >>

    Your implementation is a mess, /vastly/ more difficult to prove correct

    Malcolm can't even spell 'integer' correctly in that code blob :-).

    Certainly the intent of Fir's algorithm is easily discerned from
    his code. I can't say that about Malcolms.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Malcolm McLean on Sat Mar 16 18:21:38 2024
    Malcolm McLean <[email protected]> writes:
    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.

    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?


    Example given. A recursive algorithm which is hard to reason about and

    Perhaps hard for _you_ to reason about. That doesn't
    generalize to every other programmer that might read that
    code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to bart on Sat Mar 16 20:23:57 2024
    On 16/03/2024 19:13, bart wrote:
    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom for
    my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to
    do it  by hand pixel by pixel and the need to change color of given
    element is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on
    adjacent 4 pixels (only need check if it is in screen at all and if
    the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
       if(old_color == new_color) return 0;

       if(XYIsInScreen( x,  y))
       if(GetPixelUnsafe(x,y)==old_color)
       {
         SetPixelSafe(x,y,new_color);
         RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color);
         RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color);
         RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color);
         RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color);
         return 1;
       }

       return 0;
    }

    it work but im not quite sure how to estimate the safety of this  -

    On my machine, it's OK up to a 400x400 image (starting with all one
    colour and filling from the centre with another colour).

    At 500x500, I get stack overflow. The 400x400 the maximum recursion
    depth is 80,000 calls.

    For an NxN image filling from the centre, the max depth is N*N/2, or
    from one corner, it's N*N.

    The depth with an N*1 image starting from one end seems to just N.

    It appears to fill as much as possible (in my tests, all remaining
    pixels), before returning from any call, at which point, the work is done.

    I've just looked in my Computer Graphics Principles and Practice book
    (after blowing off the dust), and the algorithm above is exactly the 'FloodFill4' one in the book. It mentions the problems with the stack;
    maybe I should have looked in there first.

    It talks about better approaches, but it doesn't give a better algorithm
    that I can see. Perhaps the OP should just do an online search for one.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to bart on Sun Mar 17 10:42:37 2024
    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.
    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    No, which is why I did not make such an assertion.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Scott Lurndal on Sun Mar 17 10:31:18 2024
    [email protected] (Scott Lurndal) writes:

    David Brown <[email protected]> writes:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern. But not >>> sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove correct

    Malcolm can't even spell 'integer' correctly in that code blob :-).

    As someone with dyslexia I have never liked mocking remarks about
    spelling errors. Using "even" suggests that a superficial issue hints
    at deeper problems. This is rarely the case.

    However, I /would/ urge Malcolm to correct the spelling if Bresenham
    since the intent was clearly to credit the discoverer. Also,
    misspellings don't play well with library databases.

    Certainly the intent of Fir's algorithm is easily discerned from
    his code. I can't say that about Malcolms.

    I have some reservations about the code, but he posted a link so there
    is no indication that he wants a review of it.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Sun Mar 17 11:25:00 2024
    Malcolm McLean <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.
    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    Example given. A recursive algorithm which is hard to reason about and
    prove correct, because we don't really know whether under perfectly reasonable assumptions it will or will not blow the stack.

    Had you offered a proof that your code neither "blows the stack" nor
    runs out of any other resource we'd have a starting point for
    comparison, but you have not done that.

    Mind you, had you done that, we would have something that might
    eventually become only one piece of evidence for what is an
    astonishingly general remark. Broadly applicable remarks require either broadly applicable evidence or a wealth of distinct cases.

    Your "rule" suggests that all reasoning is impeded by the presence of
    recursion and I don't think you can support that claim. This is
    characteristic of many of your remarks -- they are general "rules" that
    often remain rules even when there is evidence to the contrary.

    I'll make another point in the hope of clarifying the matter. An
    algorithm or code is usually proved correct (or not!) under the
    assumption that it has adequate resources -- usually time and storage.
    Further reasoning may then be done to determine the resource
    requirements since this is so often dependent on context. This
    separation is helpful as you don't usually want to tie "correctness" to
    some specific installation. The code might run on a system with a
    dynamically allocated stack, for example, that has very similar
    limitations to "heap" memory.

    To put is more generally, we often want to prove properties of code that
    are independent of physical constraints. Your remark includes this kind reasoning. Did you intend it to?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Sun Mar 17 14:46:25 2024
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom
    for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area
    of of one color into another color (becouse if no someone would
    need to do it  by hand pixel by pixel and the need to change color
    of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same function
    on adjacent 4 pixels (only need check if it is in screen at all and
    if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color);
        return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this  - incidentally as i said i use this editor to low res graphics  like 200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but
    still

    is there maybe simple way to improve it?

    This is a cheap and cheerful fllod fill. And it's easy to get right
    and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through diagonal?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Malcolm McLean on Sun Mar 17 12:49:33 2024
    On 17/03/2024 12:28, Malcolm McLean wrote:
    On 17/03/2024 10:31, Ben Bacarisse wrote:
    [email protected] (Scott Lurndal) writes:

    David Brown <[email protected]> writes:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern.
    But not
    sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove correct >>>
    Malcolm can't even spell 'integer' correctly in that code blob :-).

    As someone with dyslexia I have never liked mocking remarks about
    spelling errors.  Using "even" suggests that a superficial issue hints
    at deeper problems.  This is rarely the case.

    However, I /would/ urge Malcolm to correct the spelling if Bresenham
    since the intent was clearly to credit the discoverer.  Also,
    misspellings don't play well with library databases.

    Certainly the intent of Fir's algorithm is easily discerned from
    his code.   I can't say that about Malcolms.

    I have some reservations about the code, but he posted a link so there
    is no indication that he wants a review of it.

    Tbe main intent was to help fir. That algorithm does tend to blow the
    stack though of course it depends on the image. However worst case is a pattern which is pixel wide line, e.g. a spiral or a maze or a series of alterante light and dark bands with a lirtel gaps at each end. And you achieve that by filling half the pixels. So foe a 100x100 image your
    worst case is 10,000 = 5,000 recursive calls, and the stack is blown.

    I'd been planning to create a square spiral. But I found I got N*N/2
    behaviour even with a blank image which was filled in from one corner.

    After thinking about it, it became obvious that the potential call depth
    wasn't the distance to a boundary in any direction, but the number of
    pixels in the area to be eventually filled in, which could be a big
    chunk of the N*N total.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Mar 17 12:54:34 2024
    On 17/03/2024 12:46, Michael S wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom
    for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area
    of of one color into another color (becouse if no someone would
    need to do it  by hand pixel by pixel and the need to change color
    of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same function
    on adjacent 4 pixels (only need check if it is in screen at all and
    if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color); >>>     RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color); >>>     RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color); >>>     RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color); >>>     return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this  -
    incidentally as i said i use this editor to low res graphics  like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but
    still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get right
    and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through diagonal?

    Suppose you have an image which is a chessboard. You want to fill one of
    the black squares so that it is red.

    If you allow connectivity through the diagonals (so two notionally
    square pixels that only meet at their corners would be connected), then
    all the black squares would turn red, not just one.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter 'Shaggy' Haywood@21:1/5 to All on Sun Mar 17 18:03:57 2024
    Groovy hepcat fir was jivin' in comp.lang.c on Sat, 16 Mar 2024 03:11
    pm. It's a cool scene! Dig it.

    i was writing simple editor (something like paint but more custom for
    my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to
    do
    it by hand pixel by pixel and the need to change color of given
    element is very common)

    Not really a C question, but I'll forgive that for now.
    What you're looking for (and can easily find on Google, Duck Duck Go
    or any other search engine, if you but utilise any of those services)
    is called a "flood fill" algorithm.
    But a word of advice: recursion can be tricky if you don't understand
    the effect. Your method creates a very large recursive chain. This is
    best avoided. Try it out "by hand". Get a piece of graph paper and draw
    some shapes on it, including some complex ones. Now choose one of these
    shapes and choose a starting pixel within this area and try applying
    your algorithm. With a coloured pencil, colour in each square as you
    go, just as the algorithm would. Also make note of the level of
    recursion as you go. I think you'll be amazed. Repeat for all the
    shapes on your graph paper.

    --


    ----- Dig the NEW and IMPROVED news sig!! -----


    -------------- Shaggy was here! ---------------
    Ain't I'm a dawg!!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Sun Mar 17 15:32:03 2024
    On Sat, 16 Mar 2024 18:21:38 GMT
    [email protected] (Scott Lurndal) wrote:

    Malcolm McLean <[email protected]> writes:
    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.


    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?


    Example given. A recursive algorithm which is hard to reason about
    and

    Perhaps hard for _you_ to reason about. That doesn't
    generalize to every other programmer that might read that
    code.



    As a matter of fact, David Brown was not able to reason about depth of recursion in fir's code. And you answered David's post without spotting
    his mistake.
    Now, I don't know if you didn't spot his mistake because you didn't read
    this part of his message or because for you too it was hard to reason
    about depth of recursion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun Mar 17 15:15:19 2024
    On Sun, 17 Mar 2024 12:54:34 +0000
    bart <[email protected]> wrote:

    On 17/03/2024 12:46, Michael S wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom
    for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed
    area of of one color into another color (becouse if no someone
    would need to do it  by hand pixel by pixel and the need to
    change color of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same function
    on adjacent 4 pixels (only need check if it is in screen at all
    and if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x-1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y-1,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y+1,  old_color,
    new_color); return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this
    - incidentally as i said i use this editor to low res graphics
    like 200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but
    still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get right
    and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through
    diagonal?

    Suppose you have an image which is a chessboard. You want to fill one
    of the black squares so that it is red.

    If you allow connectivity through the diagonals (so two notionally
    square pixels that only meet at their corners would be connected),
    then all the black squares would turn red, not just one.


    That's what I want.
    Do fir wants something else?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Mar 17 13:23:55 2024
    On 17/03/2024 13:15, Michael S wrote:
    On Sun, 17 Mar 2024 12:54:34 +0000
    bart <[email protected]> wrote:

    On 17/03/2024 12:46, Michael S wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom
    for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed
    area of of one color into another color (becouse if no someone
    would need to do it  by hand pixel by pixel and the need to
    change color of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same function
    on adjacent 4 pixels (only need check if it is in screen at all
    and if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x-1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y-1,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y+1,  old_color,
    new_color); return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this
    - incidentally as i said i use this editor to low res graphics
    like 200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but
    still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get right
    and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through
    diagonal?

    Suppose you have an image which is a chessboard. You want to fill one
    of the black squares so that it is red.

    If you allow connectivity through the diagonals (so two notionally
    square pixels that only meet at their corners would be connected),
    then all the black squares would turn red, not just one.


    That's what I want.
    Do fir wants something else?


    His algorithm is the same as that presented in my textbook, where it is
    called FloodFill4.

    If I reread the notes I see now the significance of the '4', as it talks
    about 4-connected and 8-connected versions.

    Presumably you want the 8-connected version, which will have 4 extra
    calls for the pixels at each corner.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun Mar 17 15:37:42 2024
    On Sun, 17 Mar 2024 13:23:55 +0000
    bart <[email protected]> wrote:

    On 17/03/2024 13:15, Michael S wrote:
    On Sun, 17 Mar 2024 12:54:34 +0000
    bart <[email protected]> wrote:

    On 17/03/2024 12:46, Michael S wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more
    custom for my eventual needs) for big pixel (low resolution)
    drawing

    it showed in a minute i need a click for changing given drawed
    area of of one color into another color (becouse if no someone
    would need to do it  by hand pixel by pixel and the need to
    change color of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same
    function on adjacent 4 pixels (only need check if it is in
    screen at all and if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x-1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y-1,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y+1,  old_color,
    new_color); return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this
    - incidentally as i said i use this editor to low res graphics
    like 200x200 pixels or less, and it is only a toll of private
    use, yet i got no time to work on it more than 1-2-3 days i
    guess but still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get
    right and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through
    diagonal?

    Suppose you have an image which is a chessboard. You want to fill
    one of the black squares so that it is red.

    If you allow connectivity through the diagonals (so two notionally
    square pixels that only meet at their corners would be connected),
    then all the black squares would turn red, not just one.


    That's what I want.
    Do fir wants something else?


    His algorithm is the same as that presented in my textbook, where it
    is called FloodFill4.

    If I reread the notes I see now the significance of the '4', as it
    talks about 4-connected and 8-connected versions.

    Presumably you want the 8-connected version, which will have 4 extra
    calls for the pixels at each corner.



    '4' variant does not appear useful for changing colors of drawn shapes,
    like lines or circles. Nor would it work for changing color of text
    except when font is unusually bold.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to bart on Sun Mar 17 14:10:15 2024
    bart <[email protected]> writes:

    His algorithm is the same as that presented in my textbook, where it is called FloodFill4.

    s/my/his/?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to Malcolm McLean on Sun Mar 17 14:27:53 2024
    On Sat, 16 Mar 2024 11:33:20 +0000, Malcolm McLean wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom for my
    eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to do
    it  by hand pixel by pixel and the need to change color of given element
    is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on adjacent
    4 pixels (only need check if it is in screen at all and if the color to
    change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color);
        RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color);
        return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this  -
    incidentally as i said i use this editor to low res graphics  like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but still

    is there maybe simple way to improve it?

    This is a cheap and cheerful fllod fill. And it's easy to get right and shouldn't afall over. But but makes an awful not of unnecessary calls,
    and on a small system and large image might even blow the stack.

    Recursion make programs harder to reason about and prove correct.

    I would have said that those unfamiliar with the concept of recursion
    have a harder time reasoning about the effects of recursion, or proving
    their recursive code correct.

    Take fir's example code above; a simple single call to RecolorizePixelAndAdjacentOnes() will effectively recolour the
    origin cell multiple times, because of how the recursion is handled.

    As an example:
    RecolorizePixelAndAdjacentOnes(0,0,1 2)
    will
    SetPixelSafe(0,0,2);
    then invoke
    RecolorizePixelAndAdjacentOnes(1,0,1 2)
    which will
    SetPixelSafe(1,0,2)
    and subsequently invoke
    ...
    RecolorizePixelAndAdjacentOnes(0,0,1 2)
    which will
    SetPixelSafe(0,0,2);
    and then invoke
    RecolorizePixelAndAdjacentOnes(1,0,1 2)
    etc.


    [snip]

    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lew Pitcher on Sun Mar 17 15:13:18 2024
    On 17/03/2024 14:27, Lew Pitcher wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000, Malcolm McLean wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom for my >>> eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to do
    it  by hand pixel by pixel and the need to change color of given element >>> is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on adjacent >>> 4 pixels (only need check if it is in screen at all and if the color to
    change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color); >>>     RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color); >>>     RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color); >>>     RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color); >>>     return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this  -
    incidentally as i said i use this editor to low res graphics  like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get right and
    shouldn't afall over. But but makes an awful not of unnecessary calls,
    and on a small system and large image might even blow the stack.

    Recursion make programs harder to reason about and prove correct.

    I would have said that those unfamiliar with the concept of recursion
    have a harder time reasoning about the effects of recursion, or proving
    their recursive code correct.

    Take fir's example code above; a simple single call to RecolorizePixelAndAdjacentOnes() will effectively recolour the
    origin cell multiple times, because of how the recursion is handled.

    I don't think so. It may look at the original cell, but it will only
    recolour it (and recursively process its neighbours) if the colour
    hasn't yet changed to the new one.

    If I take a 100x100 image with 10,000 cells, which all have to be filled
    to the new colour, then SetPixelSafe is called exactly 10,000 times.

    The problem is that most of the work is done along a 10,000-deep chain
    of nested calls.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Sun Mar 17 17:42:18 2024
    On Sun, 17 Mar 2024 14:56:34 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 15:09, Malcolm McLean wrote:
    On 16/03/2024 14:40, David Brown wrote:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern.
    But not sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove
    correct than the OP's original one, and unlikely to be very much
    faster (it will certainly scale in the same way in both time and
    memory usage).

    Now is this David Brown being David Borwn, ot its it actaully ture?


    And I need to run some tests, don't I?


    Let's give it a whirl


    <snip>

    malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c malcolm@Malcolms-iMac cscratch % ./a.out
    floodfill_r 1.69274
    floodfill4 0.336705



    Now try the case in which original recursion is particularly deep.
    Something like that:
    *.***.**
    *.*.*.*.
    *.*.*.*.
    *.*.*.*.
    *.*.*.*.
    *.*.*.*.
    *.*.*.*.
    ***.***.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Sun Mar 17 17:45:16 2024
    On 16/03/2024 16:09, Malcolm McLean wrote:
    On 16/03/2024 14:40, David Brown wrote:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern. But
    not sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove
    correct than the OP's original one, and unlikely to be very much
    faster (it will certainly scale in the same way in both time and
    memory usage).

    Now is this David Brown being David Borwn, ot its it actaully ture?

    I don't know who "David Borwn" might be, nor what "ture" means. If you
    can't type, and can't spell, then at least pay the group the respect of
    using a spell-checker.

    It's not designed to be eay to prove correct, that's true. And the
    maintain it's mess is that we are managing the queue manually for speed.

    It is badly designed code. It is a jumble of wildly different concepts,
    thrown together in one huge function with no structure or organisation,
    and with meaningless names for the variables and absurd names for the parameters.

    The OP's code is simple and obvious, as is its correctness (assuming
    reasonable definitions of the pixel access and setting functions) and
    its time and space requirements. Yours is not.

    Your algorithm could be used in a proper implementation, with separate functions to handle the different parts (such as the stack). The
    algorithm itself is not bad, it's the implementation that is the main
    problem.


    But the naive recursive algorithm is O(N) (N = pixels to flood), and inherently we can't beat that without special hardware.

    Assuming you are measuring the number of pixels read or written here,
    then that is, I think, correct.

    The recursive
    one tends to be slow because calls are expensive.

    Yes, I agree that recursion can be slow (unless it is simple enough for
    the compiler to turn it into a loop). And it typically takes more stack
    space than you'd need for a dedicated queue. But whether or not that
    makes a significant difference depends on the code in question, and how
    much work you are doing within the code. If step of the algorithm takes
    a lot of time anyway, the call overhead will be of less relevance.

    I would expect that your code would be several times faster than the
    OP's, with similar scaling. But the OP's is understandable and easily
    seen to be correct, unlike yours, and correctness trumps speed every time.

    And starting from a correct recursive version, it's possible to improve
    on it in many ways while retaining correctness.

    And mine makes calls
    to malloc() and realloc to manage the queue. And of course whilst we
    might blow the stack, we are much less likely to run out of heap.

    True.


    And it's been tweaked abit in hacky way to make it faster on real
    images. And whilst it's still going to work, is it out of date?


    I have no idea if your code is "out of date" or not. It seems to be
    written for images consisting of unsigned chars, so I a not sure it was
    ever designed for real-world images.

    What is clear is that you have taken an okay algorithm - not state of
    the art, but not the worst - and made a dog's breakfast of an
    implementation in your attempts to micro-optimise. This means you have
    code that can't be easily analysed or seen to be correct, cannot be
    improved algorithmically, and cannot be expanded or gain new features
    without a massive re-write.

    And I need to run some tests, don't I?


    If you like.



    There are a variety of different flood-fill algorithms, with different
    advantages and disadvantages.  Speeds will often depend as much on the
    way the get/set pixel code works, especially if the flood-fill is on
    live displayed data rather than in a buffer off-screen.  But typically
    you need to get a /lot/ more advanced (i.e., not your algorithm) to
    improve on the OP's version by an order of magnitude, so if speed is
    not essential but understanding that it is correct is important, then
    it makes more sense to stick to the original recursive version.


    What are these / lot / more advanced algorithms? Maybe they exist. But
    don't people deserve some sort of link?


    <https://gprivate.com/6a2yp>

    I don't know if it is fair to call them a /lot/ more advanced, but
    certainly a bit more advanced. And certainly better implementations are possible.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Ben Bacarisse on Sun Mar 17 17:59:39 2024
    On 17/03/2024 11:31, Ben Bacarisse wrote:
    [email protected] (Scott Lurndal) writes:

    David Brown <[email protected]> writes:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern. But not >>>> sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove correct

    Malcolm can't even spell 'integer' correctly in that code blob :-).

    As someone with dyslexia I have never liked mocking remarks about
    spelling errors. Using "even" suggests that a superficial issue hints
    at deeper problems. This is rarely the case.

    However, I /would/ urge Malcolm to correct the spelling if Bresenham
    since the intent was clearly to credit the discoverer. Also,
    misspellings don't play well with library databases.


    I also have dyslexia. I am dependent on a spell checker to spell
    accurately. And that is one of the reasons why Malcolm should do a
    better job of writing accurately - it is much easier for others to read
    posts when the spelling and the grammar is correct. I would, of course,
    also like him to do a better job at his grammar, but using a
    spell-checker is so simple and low-cost that it is inexcusable for him
    not to use one.

    I have nothing bad to say about people who can't spell well, or who
    can't type well, and I have nothing but respect for people who are
    trying their best to write in a second (or third, or more) language.

    But Malcolm is a native English speaker with a degree in English. He is
    not dyslexic (or at least, the mistakes he makes are not typical signs
    of dyslexia). He is simply a poor typist and too lazy to make an effort
    to correct his errors.


    Certainly the intent of Fir's algorithm is easily discerned from
    his code. I can't say that about Malcolms.

    I have some reservations about the code, but he posted a link so there
    is no indication that he wants a review of it.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Sun Mar 17 18:05:22 2024
    On 17/03/2024 14:37, Michael S wrote:
    On Sun, 17 Mar 2024 13:23:55 +0000
    bart <[email protected]> wrote:

    On 17/03/2024 13:15, Michael S wrote:
    On Sun, 17 Mar 2024 12:54:34 +0000
    bart <[email protected]> wrote:

    On 17/03/2024 12:46, Michael S wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more
    custom for my eventual needs) for big pixel (low resolution)
    drawing

    it showed in a minute i need a click for changing given drawed
    area of of one color into another color (becouse if no someone
    would need to do it  by hand pixel by pixel and the need to
    change color of given element is very common)

    there is very simple method of doing it - i men i click in given >>>>>>> color pixel then replace it by my color and call the same
    function on adjacent 4 pixels (only need check if it is in
    screen at all and if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
      if(old_color == new_color) return 0;

      if(XYIsInScreen( x,  y))
      if(GetPixelUnsafe(x,y)==old_color)
      {
        SetPixelSafe(x,y,new_color);
        RecolorizePixelAndAdjacentOnes(x+1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x-1, y,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y-1,  old_color,
    new_color); RecolorizePixelAndAdjacentOnes(x, y+1,  old_color,
    new_color); return 1;
      }

      return 0;
    }

    it work but im not quite sure how to estimate the safety of this >>>>>>> - incidentally as i said i use this editor to low res graphics
    like 200x200 pixels or less, and it is only a toll of private
    use, yet i got no time to work on it more than 1-2-3 days i
    guess but still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get
    right and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through
    diagonal?

    Suppose you have an image which is a chessboard. You want to fill
    one of the black squares so that it is red.

    If you allow connectivity through the diagonals (so two notionally
    square pixels that only meet at their corners would be connected),
    then all the black squares would turn red, not just one.


    That's what I want.
    Do fir wants something else?


    His algorithm is the same as that presented in my textbook, where it
    is called FloodFill4.

    If I reread the notes I see now the significance of the '4', as it
    talks about 4-connected and 8-connected versions.

    Presumably you want the 8-connected version, which will have 4 extra
    calls for the pixels at each corner.



    '4' variant does not appear useful for changing colors of drawn shapes,
    like lines or circles. Nor would it work for changing color of text
    except when font is unusually bold.


    An 8-connected flood fill is typically too much, and a 4-connected flood
    fill is often too little. Neither is perfect for all cases, but I think
    the 4-connected version is the most commonly used.

    And as they stand, both are useless for images that are come from
    realistic pictures (as distinct from drawings), since real colours
    change gradually.

    That's why graphics programs have feathered selection and masking, fuzzy
    fills, and so on.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Malcolm McLean on Sun Mar 17 17:11:44 2024
    On 2024-03-17, Malcolm McLean <[email protected]> wrote:
    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited whilst stacks are not.

    The strawman, absolutist conventional wisdom that "recursion is always
    easier to analyze than iteration" is wrong in the first place.

    Any program graph based on nothing but IF and GOTO primitives can be mechanically transliterated into a (tail) recursive structure that has
    the same shape, and is no easier to understand.

    Your point is not very well made, though. Even though recursion may run
    into a resource limit, its structure can still help analyze the logic of
    the algorithm apart from that resource issue. The resource issue can be separately analyzed and provisions made for the algorithm to handle the required inputs, and reject others.

    Most algorithms (especially ones working with all inputs in memory)
    are constrained by resources. The iterative version of that image
    processing algorithm might handle larger images than the recursive
    one, but there are yet image sizes it won't handle.

    The idea of calling algorithm implementations "incorrect" if they have
    any limitations on their input sizes and such isn't particularly
    informative or useful.

    Obviously it is incorrect if something has limitations, and is used
    in such a way that they are exceeded. E.g. the C <int> + <int>
    operation when a result is implied that is beyond INT_MIN or INT_MAX.
    Oops, + is not "correct"; don't use it!

    Now, there is a bit of value in algorithms that will successfully
    operate on any object that was successfully fit into memory. Do
    these really exist though? Pretty much any algorithm implementation
    requires some space to do its work, even if that space is small and
    fixed. It's possible that the input fit into memory, yet that small and
    fixed amount of space is not available.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Sun Mar 17 18:25:20 2024
    On Sun, 17 Mar 2024 14:56:34 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 15:09, Malcolm McLean wrote:
    On 16/03/2024 14:40, David Brown wrote:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern.
    But not sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove
    correct than the OP's original one, and unlikely to be very much
    faster (it will certainly scale in the same way in both time and
    memory usage).

    Now is this David Brown being David Borwn, ot its it actaully ture?


    And I need to run some tests, don't I?

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>

    int floodfill_r(unsigned char *grey, int width, int height, int x,
    int y, unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    grey[y*width+x] = dest;
    floodfill_r(grey, width, height, x - 1, y, target, dest);
    floodfill_r(grey, width, height, x + 1, y, target, dest);
    floodfill_r(grey, width, height, x, y - 1, target, dest);
    floodfill_r(grey, width, height, x, y + 1, target, dest);

    return 0;
    }

    /**
    Floodfill4 - floodfill, 4 connectivity.

    @param[in,out] grey - the image (formally it's greyscale but it
    could be binary or indexed)
    @param width - image width
    @param height - image height
    @param x - seed point x
    @param y - seed point y
    @param target - the colour to flood
    @param dest - the colur to replace it by.
    @returns Number of pixels flooded.
    */
    int floodfill4(unsigned char *grey, int width, int height, int x, int
    y, unsigned char target, unsigned char dest)
    {
    int *qx = 0;
    int *qy = 0;
    int qN = 0;
    int qpos = 0;
    int qcapacity = 0;
    int wx, wy;
    int ex, ey;
    int tx, ty;
    int ix;
    int *temp;
    int answer = 0;

    if(grey[y * width + x] != target)
    return 0;
    qx = malloc(width * sizeof(int));
    qy = malloc(width * sizeof(int));
    if(qx == 0 || qy == 0)
    goto error_exit;
    qcapacity = width;
    qx[qpos] = x;
    qy[qpos] = y;
    qN = 1;

    while(qN != 0)
    {
    tx = qx[qpos];
    ty = qy[qpos];
    qpos++;
    qN--;

    if(qpos == 256)
    {
    memmove(qx, qx + 256, qN*sizeof(int));
    memmove(qy, qy + 256, qN*sizeof(int));
    qpos = 0;
    }
    if(grey[ty*width+tx] != target)
    continue;
    wx = tx;
    wy = ty;
    while(wx >= 0 && grey[wy*width+wx] == target)
    wx--;
    wx++;
    ex = tx;
    ey = ty;
    while(ex < width && grey[ey*width+ex] == target)
    ex++;
    ex--;


    for(ix=wx;ix<=ex;ix++)
    {
    grey[ty*width+ix] = dest;
    answer++;
    }

    if(ty > 0)
    for(ix=wx;ix<=ex;ix++)
    {
    if(grey[(ty-1)*width+ix] == target)
    {
    if(qpos + qN == qcapacity)
    {
    temp = realloc(qx, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qx = temp;
    temp = realloc(qy, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qy = temp;
    qcapacity += width;
    }
    qx[qpos+qN] = ix;
    qy[qpos+qN] = ty-1;
    qN++;
    }
    }
    if(ty < height -1)
    for(ix=wx;ix<=ex;ix++)
    {
    if(grey[(ty+1)*width+ix] == target)
    {
    if(qpos + qN == qcapacity)
    {
    temp = realloc(qx, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qx = temp;
    temp = realloc(qy, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qy = temp;
    qcapacity += width;
    }
    qx[qpos+qN] = ix;
    qy[qpos+qN] = ty+1;
    qN++;
    }
    }
    }

    free(qx);
    free(qy);

    return answer;
    error_exit:
    free(qx);
    free(qy);
    return -1;
    }

    int main(void)
    {
    unsigned char *image;
    clock_t tick, tock;
    int i;

    image = malloc(100 * 100);
    tick = clock();
    for (i = 0 ; i < 10000; i++)
    {
    memset(image, 0, 100 * 100);
    floodfill_r(image, 100, 100, 50, 50, 0, 1);
    }
    tock = clock();
    printf("floodfill_r %g\n", ((double)(tock -
    tick))/CLOCKS_PER_SEC);

    tick = clock();
    for (i = 0 ; i < 10000; i++)
    {
    memset(image, 0, 100 * 100);
    floodfill4(image, 100, 100, 50, 50, 0, 1);
    }
    tock = clock();
    printf("floodfill4 %g\n", ((double)(tock - tick))/CLOCKS_PER_SEC);

    return 0;
    }


    Let's give it a whirl

    malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c malcolm@Malcolms-iMac cscratch % ./a.out
    floodfill_r 1.69274
    floodfill4 0.336705



    I find your performance measurement non-decisive for two reasons:
    (1) because your test case is too trivial and probably uncharacteristic
    and
    (2) because recursive variant could be trivially rewritten in a way
    that reduces # of stack memory accesses by factor of 2 or 3.
    Like that:

    struct recursive_context_t {
    unsigned char *grey;
    int width, height;
    unsigned char target, dest;
    };

    static void floodfill_r_core(const struct recursive_context_t* context,
    int x, int y) {
    if (x < 0 || x >= context->width || y < 0 || y >= context->height)
    return;
    if (context->grey[y*context->width+x] == context->target) {
    context->grey[y*context->width+x] = context->dest;
    floodfill_r_core(context, x - 1, y);
    floodfill_r_core(context, x + 1, y);
    floodfill_r_core(context, x, y - 1);
    floodfill_r_core(context, x, y + 1);
    }
    }

    int floodfill_r(
    unsigned char *grey,
    int width, int height,
    int x, int y,
    unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    struct recursive_context_t context = {
    .grey = grey,
    .width = width,
    .height = height,
    .target = target,
    .dest = dest,
    };
    floodfill_r_core(&context, x, y);
    return 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Sun Mar 17 19:39:08 2024
    On Sun, 17 Mar 2024 18:25:20 +0200
    Michael S <[email protected]> wrote:

    On Sun, 17 Mar 2024 14:56:34 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 15:09, Malcolm McLean wrote:
    On 16/03/2024 14:40, David Brown wrote:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern.
    But not sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove
    correct than the OP's original one, and unlikely to be very much
    faster (it will certainly scale in the same way in both time and
    memory usage).

    Now is this David Brown being David Borwn, ot its it actaully
    ture?

    And I need to run some tests, don't I?

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>

    int floodfill_r(unsigned char *grey, int width, int height, int x,
    int y, unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    grey[y*width+x] = dest;
    floodfill_r(grey, width, height, x - 1, y, target, dest);
    floodfill_r(grey, width, height, x + 1, y, target, dest);
    floodfill_r(grey, width, height, x, y - 1, target, dest);
    floodfill_r(grey, width, height, x, y + 1, target, dest);

    return 0;
    }

    /**
    Floodfill4 - floodfill, 4 connectivity.

    @param[in,out] grey - the image (formally it's greyscale but it
    could be binary or indexed)
    @param width - image width
    @param height - image height
    @param x - seed point x
    @param y - seed point y
    @param target - the colour to flood
    @param dest - the colur to replace it by.
    @returns Number of pixels flooded.
    */
    int floodfill4(unsigned char *grey, int width, int height, int x,
    int y, unsigned char target, unsigned char dest)
    {
    int *qx = 0;
    int *qy = 0;
    int qN = 0;
    int qpos = 0;
    int qcapacity = 0;
    int wx, wy;
    int ex, ey;
    int tx, ty;
    int ix;
    int *temp;
    int answer = 0;

    if(grey[y * width + x] != target)
    return 0;
    qx = malloc(width * sizeof(int));
    qy = malloc(width * sizeof(int));
    if(qx == 0 || qy == 0)
    goto error_exit;
    qcapacity = width;
    qx[qpos] = x;
    qy[qpos] = y;
    qN = 1;

    while(qN != 0)
    {
    tx = qx[qpos];
    ty = qy[qpos];
    qpos++;
    qN--;

    if(qpos == 256)
    {
    memmove(qx, qx + 256, qN*sizeof(int));
    memmove(qy, qy + 256, qN*sizeof(int));
    qpos = 0;
    }
    if(grey[ty*width+tx] != target)
    continue;
    wx = tx;
    wy = ty;
    while(wx >= 0 && grey[wy*width+wx] == target)
    wx--;
    wx++;
    ex = tx;
    ey = ty;
    while(ex < width && grey[ey*width+ex] == target)
    ex++;
    ex--;


    for(ix=wx;ix<=ex;ix++)
    {
    grey[ty*width+ix] = dest;
    answer++;
    }

    if(ty > 0)
    for(ix=wx;ix<=ex;ix++)
    {
    if(grey[(ty-1)*width+ix] == target)
    {
    if(qpos + qN == qcapacity)
    {
    temp = realloc(qx, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qx = temp;
    temp = realloc(qy, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qy = temp;
    qcapacity += width;
    }
    qx[qpos+qN] = ix;
    qy[qpos+qN] = ty-1;
    qN++;
    }
    }
    if(ty < height -1)
    for(ix=wx;ix<=ex;ix++)
    {
    if(grey[(ty+1)*width+ix] == target)
    {
    if(qpos + qN == qcapacity)
    {
    temp = realloc(qx, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qx = temp;
    temp = realloc(qy, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qy = temp;
    qcapacity += width;
    }
    qx[qpos+qN] = ix;
    qy[qpos+qN] = ty+1;
    qN++;
    }
    }
    }

    free(qx);
    free(qy);

    return answer;
    error_exit:
    free(qx);
    free(qy);
    return -1;
    }

    int main(void)
    {
    unsigned char *image;
    clock_t tick, tock;
    int i;

    image = malloc(100 * 100);
    tick = clock();
    for (i = 0 ; i < 10000; i++)
    {
    memset(image, 0, 100 * 100);
    floodfill_r(image, 100, 100, 50, 50, 0, 1);
    }
    tock = clock();
    printf("floodfill_r %g\n", ((double)(tock -
    tick))/CLOCKS_PER_SEC);

    tick = clock();
    for (i = 0 ; i < 10000; i++)
    {
    memset(image, 0, 100 * 100);
    floodfill4(image, 100, 100, 50, 50, 0, 1);
    }
    tock = clock();
    printf("floodfill4 %g\n", ((double)(tock -
    tick))/CLOCKS_PER_SEC);

    return 0;
    }


    Let's give it a whirl

    malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c malcolm@Malcolms-iMac cscratch % ./a.out
    floodfill_r 1.69274
    floodfill4 0.336705



    I find your performance measurement non-decisive for two reasons:
    (1) because your test case is too trivial and probably
    uncharacteristic and
    (2) because recursive variant could be trivially rewritten in a way
    that reduces # of stack memory accesses by factor of 2 or 3.
    Like that:

    struct recursive_context_t {
    unsigned char *grey;
    int width, height;
    unsigned char target, dest;
    };

    static void floodfill_r_core(const struct recursive_context_t*
    context, int x, int y) {
    if (x < 0 || x >= context->width || y < 0 || y >= context->height)
    return;
    if (context->grey[y*context->width+x] == context->target) {
    context->grey[y*context->width+x] = context->dest;
    floodfill_r_core(context, x - 1, y);
    floodfill_r_core(context, x + 1, y);
    floodfill_r_core(context, x, y - 1);
    floodfill_r_core(context, x, y + 1);
    }
    }

    int floodfill_r(
    unsigned char *grey,
    int width, int height,
    int x, int y,
    unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    struct recursive_context_t context = {
    .grey = grey,
    .width = width,
    .height = height,
    .target = target,
    .dest = dest,
    };
    floodfill_r_core(&context, x, y);
    return 1;
    }



    I did my own measurements with snake-like image from my first
    response to Malcolm. For this shape, recursive version (after my
    improvement) is almost exactly 10 times slower than Malcolm's iterative
    code. And suspect to stack overflow although a little less so than
    original.
    Even if in Big Oh sense they are the same, it does look like Malcolm's
    variant is decisively faster in practice.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Spiros Bousbouras on Sun Mar 17 22:14:04 2024
    Spiros Bousbouras <[email protected]> writes:

    On Sun, 17 Mar 2024 14:10:15 +0000
    Ben Bacarisse <[email protected]> wrote:
    bart <[email protected]> writes:

    His algorithm is the same as that presented in my textbook, where it is
    called FloodFill4.

    s/my/his/?

    What is mentioned in <ut4v4r$32mgb$[email protected]> : "I've just looked in my Computer Graphics Principles and Practice book" .

    That context seems to have got lost, and MM was quoting from his
    textbook (or book at any rate). Thanks for pointing out the missing
    context.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Ben Bacarisse on Sun Mar 17 22:21:28 2024
    On 17/03/2024 22:14, Ben Bacarisse wrote:
    Spiros Bousbouras <[email protected]> writes:

    On Sun, 17 Mar 2024 14:10:15 +0000
    Ben Bacarisse <[email protected]> wrote:
    bart <[email protected]> writes:

    His algorithm is the same as that presented in my textbook, where it is >>>> called FloodFill4.

    s/my/his/?

    What is mentioned in <ut4v4r$32mgb$[email protected]> : "I've just looked in >> my Computer Graphics Principles and Practice book" .

    That context seems to have got lost, and MM was quoting from his
    textbook (or book at any rate). Thanks for pointing out the missing
    context.


    'His' algorithm was the one in the OP posted by fir.

    'My' (bart's) textbook was the CGPP book by Foley, van Dam, et al.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Mon Mar 18 07:58:44 2024
    On 17/03/2024 19:28, Malcolm McLean wrote:
    On 17/03/2024 16:45, David Brown wrote:
    On 16/03/2024 16:09, Malcolm McLean wrote:

    The OP's code is simple and obvious, as is its correctness (assuming
    reasonable definitions of the pixel access and setting functions) and
    its time and space requirements.  Yours is not.

    Except it is not. You didn't give the right answer for the space requirements.

    Unfortunately, I am still fallible - /easier/ does not mean I'll get it
    right :-( And I apologise for unhelpfully rushing that and getting it
    wrong.

    However, I stand by my claim that the recursive version is much easier
    to analyse.

    Your algorithm could be used in a proper implementation, with separate
    functions to handle the different parts (such as the stack).  The
    algorithm itself is not bad, it's the implementation that is the main
    problem.

    It's better to have one function. Subroutines have a way of getting lost.>

    Seriously? "Subroutines get lost" ? So your answer is to put all your
    ideas in a mixer and scrunch them up until any semblance of logic and
    structure is lost, and the code works (if it does) by trial and error?
    And then the whole mess is cut-and-paste duplicated - along with any
    subtle bugs it might have - for 8-connected version. And that's better,
    in your eyes, than re-using code?

    I have no idea if your code is "out of date" or not.  It seems to be
    written for images consisting of unsigned chars, so I a not sure it
    was ever designed for real-world images.

    It was written a long time ago. But it is writeen in a conservative
    subset of ANSI C, and so of course it still works, and should work for
    along time to come. But the 256 integer queue tweak might be out of
    date. And cache use is far more important now that it was on big
    processors. So it might be a bit long in the tooth.


    I have been most interested in being able to be sure the algorithm is
    correct, rather than considering its absolute (rather than "big O")
    efficiency in different systems. It is certainly the case that cache considerations are more relevant now than they used to be on many
    systems. And for working on PC's, you would likely dispense with your
    growing stack entirely and simply allocate a queue big enough for every
    pixel in the image.

    And it's part of the binary image library, and it's designed for marking
    8- or 4- connected sections of those images by setting the 1 to a
    different value. And then further processing. The binary images are
    often derived from photographs by Otsu thresholding, which is in the
    same library. But they aren't usually meant for human viewing by end users.

    I don't know if it is fair to call them a /lot/ more advanced, but
    certainly a bit more advanced.  And certainly better implementations
    are possible.

    And are you going to be constructive or not? Suggest one which might be better? Even implement it?


    I suggested separating the code into functions - that is /definitely/ constructive. I suggested using sensible names for parameters and
    variables (well, the suggestion was implied by my criticism).

    And I am also suggesting now that you allocate a queue that is big
    enough for every pixel in the image. Much of what you don't touch of
    that space, will probably never be physically allocated by the OS,
    depending on page sizes and free memory.

    And I would also suggest you drop the requirement for coding in an
    ancient tongue, and instead switch to reasonably modern C. Make
    abstractions for the types and the access functions - it will make the
    code far easier to follow, easier to show correct, and easier to modify
    and reuse, without affecting efficiency at run-time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Mon Mar 18 02:30:59 2024
    Michael S <[email protected]> writes:

    On 16/03/2024 04:11, fir wrote:

    i was writing simple editor (something like paint but more custom
    for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area
    of of one color into another color (becouse if no someone would
    need to do it by hand pixel by pixel and the need to change color
    of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same function
    on adjacent 4 pixels (only need check if it is in screen at all and
    if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x, y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
    }

    return 0;
    }

    [...]

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through diagonal?

    It is customary in raster graphics to count pixels as adjacent
    only if they share an edge, not if they just share a corner.
    Usually that gives better results; the exceptions tend to need
    special handling anyway and not just connecting through
    diagonals.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to bart on Mon Mar 18 03:00:32 2024
    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:

    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.

    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    The claim is that recursion always makes programs harder to reason
    about and prove correct. It's easy to find examples that show
    recursion does not always makes programs harder to reason about and
    prove correct.

    I personally find recursion hard work and errors much harder to
    debug.

    Most likely that's because you haven't had the relevant background
    in learning how to program in a functional style. That matches my
    own experience: it was only after learning how to write programs in
    a functional style that I really started to appreciate the benefits
    of using recursion, and to understand how to write and reason about
    recursive programs.

    It is also becomes much more important to show that will not cause
    stack overflow.

    In most cases it's enough to show that the stack depth never exceeds
    log N for an input of size N. I use recursion quite routinely
    without there being any significant danger of stack overflow. It's
    a matter of learning which patterns are safe and which patterns are
    potentially dangerous, and avoiding the dangerous patterns (unless
    certain guarantees can be made to make them safe again).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to All on Mon Mar 18 03:04:57 2024
    bart <[email protected]> writes:

    P.S. You deserve credit for pointing out that the worst case
    for flood fill is changing the color of the entire pixel
    field. Maybe it was obvious to other people but I appreciate
    it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Mon Mar 18 14:23:51 2024
    On Mon, 18 Mar 2024 03:00:32 -0700
    Tim Rentsch <[email protected]> wrote:

    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:

    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.


    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    The claim is that recursion always makes programs harder to reason
    about and prove correct. It's easy to find examples that show
    recursion does not always makes programs harder to reason about and
    prove correct.

    I personally find recursion hard work and errors much harder to
    debug.

    Most likely that's because you haven't had the relevant background
    in learning how to program in a functional style. That matches my
    own experience: it was only after learning how to write programs in
    a functional style that I really started to appreciate the benefits
    of using recursion, and to understand how to write and reason about
    recursive programs.

    It is also becomes much more important to show that will not cause
    stack overflow.

    In most cases it's enough to show that the stack depth never exceeds
    log N for an input of size N. I use recursion quite routinely
    without there being any significant danger of stack overflow. It's
    a matter of learning which patterns are safe and which patterns are potentially dangerous, and avoiding the dangerous patterns (unless
    certain guarantees can be made to make them safe again).

    The problem in this case is that max. depth of recursion is O(N) where N
    is total number of pixels to change color. So far I didn't find an
    obvious way to cut the worst case by more than small factor without
    turning recursive algorithm into something that is unrecognizably
    different from original and require proof of correction of its own.
    Classic 'divide and conquer smaller part first" strategy does not
    appear applicable here, or at least not obviously.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Chris M. Thomasson on Mon Mar 18 14:40:07 2024
    On Sun, 17 Mar 2024 13:19:29 -0700
    "Chris M. Thomasson" <[email protected]> wrote:

    On 3/16/2024 1:29 PM, Chris M. Thomasson wrote:
    On 3/16/2024 1:02 PM, Malcolm McLean wrote:
    On 16/03/2024 18:21, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove
    correct.

    Are you prepared to offer any evidence to support this
    astonishing statement or can we just assume it's another
    Malcolmism?

    Example given. A recursive algorithm which is hard to reason
    about and

    Perhaps hard for _you_ to reason about.  That doesn't
    generalize to every other programmer that might read that
    code.


     From experience this one blows the stack, but not always.
    Sometimes it's OK to use.

    Blowing the stack is not good at all. However, sometimes, I
    consider a recursive algorithm easier to understand. So, I build it first... Get it working, _then_ think about an iterative
    solution...

    Gaining the iterative solution from a working recursive solution is
    the fun part!

    :^)


    I did.
    After a bit of polish applied to corners (on x86-64) it consumes
    approximately 60 times less extra memory than recursive variant of
    Malcolm and is approximately 2.5 faster than non-naive recursion.
    But it still decisively slower than Malcolm's non-recursive code:
    ~4x for 'snake' shape, ~2x for solid rectangle.
    Malcolm's algorithm is simply better than recursive one.
    Most likely because it visits already re-colored pixels less often.

    For those interested, here is 'explicit stack' variant of recursive
    algorithm:


    int floodfill_r_explicite_stack(
    unsigned char *grey,
    int width, int height,
    int x, int y,
    unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;

    const ptrdiff_t initial_stack_sz = 256;
    char* stack = malloc(initial_stack_sz*sizeof(*stack));
    if (!stack)
    return -1;
    char* sp = stack;
    char* end_stack = &stack[initial_stack_sz];

    enum { ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN, };
    for (;;) {
    do {
    if (grey[y*width+x] != target)
    break; // back to caller

    grey[y*width+x] = dest;
    x -= 1;
    // push state to stack
    if (sp == end_stack) { // allocate more stack space
    ptrdiff_t old_sz = sp-stack;
    ptrdiff_t new_sz = old_sz + old_sz/2;
    stack = realloc(stack, new_sz*sizeof(*stack));
    if (!stack)
    return -1;
    sp = &stack[old_sz];
    end_stack = &stack[new_sz];
    }
    *sp++ = ST_LEFT; // recursive call
    } while (x >= 0);

    for (;;) {
    if (sp == stack) { // we are back at top level
    free(stack);
    return 1; // done
    }

    char state = *--sp; // pop stack (back to caller)
    switch (state) {
    case ST_LEFT:
    x += 2;
    if (x < width) {
    *sp++ = ST_RIGHT; // recursive call
    break;
    }
    // fall throw

    case ST_RIGHT:
    x -= 1;
    y -= 1;
    if (y >= 0) {
    *sp++ = ST_UP; // recursive call
    break;
    }
    // fall throw

    case ST_UP:
    y += 2;
    if (y < height) {
    *sp++ = ST_DOWN; // recursive call
    break;
    }
    // fall throw

    case ST_DOWN:
    y -= 1;
    continue; // back to caller
    }
    break;
    }
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Mon Mar 18 17:28:41 2024
    On 18/03/2024 10:26, Malcolm McLean wrote:
    On 18/03/2024 06:58, David Brown wrote:
    On 17/03/2024 19:28, Malcolm McLean wrote:
    On 17/03/2024 16:45, David Brown wrote:
    On 16/03/2024 16:09, Malcolm McLean wrote:

    The OP's code is simple and obvious, as is its correctness (assuming
    reasonable definitions of the pixel access and setting functions)
    and its time and space requirements.  Yours is not.

    Except it is not. You didn't give the right answer for the space
    requirements.

    Unfortunately, I am still fallible - /easier/ does not mean I'll get
    it right :-(  And I apologise for unhelpfully rushing that and getting
    it wrong.

    However, I stand by my claim that the recursive version is much easier
    to analyse.

    I this case it s very short code and easy to see that it is right, so a
    win for recursion.

    To be clear - I don't claim that recursive code is /always/ easier to
    analyse. I claim it is easier in this case, and in many other cases
    (thus countering your claim that it is /always/ harder).

    Except that it is only right if the stack is bigger
    than N/2 calls deep, where N is the number fo pixels in the image.

    It is completely normal for correctness proofs to make assumptions about
    things like resources. An analysis of your code for correctness would
    also generally assume that the heap would be big enough - if the heap
    runs out, your code will not correctly flood-fill the image. Analysis
    of efficiency in time and space is a separate issue - related, but
    separate. Things like maximum recursion depth (and heap size) are very implementation-specific, and thus need to be considered separately from
    the algorithm itself.

    And while this code is in C, the same algorithm could be implemented in
    other languages. A language that uses a VM might be fine with a much
    higher recursion depth - or it might be much lower. A language for
    which recursion is a major tool (such as a functional programming
    language) might automatically convert some recursive code to a
    queue-based non-recursive solution. (I'd be impressed to see one do
    that for this algorithm, however.)

    Now a
    100x100 woked fine an my machine  - I just checked the main stack, and
    it's 8MB by default. BUt of cuuurse the bigger than machine, the bigger
    the image th euser might want to load.


    You still haven't considered using a spell-checker, even though you use
    a news client with one built in? Perhaps you need a better keyboard?


    It's better to have one function. Subroutines have a way of getting
    lost.>

    Seriously?  "Subroutines get lost" ?  So your answer is to put all
    your ideas in a mixer and scrunch them up until any semblance of logic
    and structure is lost, and the code works (if it does) by trial and
    error? And then the whole mess is cut-and-paste duplicated - along
    with any subtle bugs it might have - for 8-connected version.  And
    that's better, in your eyes, than re-using code?

    Exactly. If a routine ia leaf, you can cut and paste it and use it where
    you will. If you have to take subroutines, you've got to explore the
    code to understand what you neeed to take, then you have to out them somewhere. So it's better to keep routines leaf is possible and fold a
    few trivial operations into the code body, even if ideally they would be subroutines. And I understand these trade offs. >

    That is a, shall we say, "interesting" attitude.

    I have been most interested in being able to be sure the algorithm is
    correct, rather than considering its absolute (rather than "big O")
    efficiency in different systems.  It is certainly the case that cache
    considerations are more relevant now than they used to be on many
    systems.  And for working on PC's, you would likely dispense with your
    growing stack entirely and simply allocate a queue big enough for
    every pixel in the image.

    That is an idea. But a bit extravanagant. I'd like to try to work out
    how much quue s actually used in typical as well as worst case.

    The worst case is either going to be the stripy path example given by
    Michael S., or a completely blank image - it depends on how the
    east-west stripes affect the queue depth. It should not be hard to try
    these. So that would be either approximately half the total pixel
    count, or the total pixel count. And I can't think how you could
    specify a "typical" image and "typical" flood fill request - without
    specifying this in some way, you need to collect lots of statistics of real-world use, or it's mere guesswork.


    I suggested separating the code into functions - that is /definitely/
    constructive.  I suggested using sensible names for parameters and
    variables (well, the suggestion was implied by my criticism).

    And I am also suggesting now that you allocate a queue that is big
    enough for every pixel in the image.  Much of what you don't touch of
    that space, will probably never be physically allocated by the OS,
    depending on page sizes and free memory.

    And I would also suggest you drop the requirement for coding in an
    ancient tongue, and instead switch to reasonably modern C.  Make
    abstractions for the types and the access functions - it will make the
    code far easier to follow, easier to show correct, and easier to
    modify and reuse, without affecting efficiency at run-time.


    And of course the entire binary image library has a consistent style.
    And we don't want the user mee=ssing about with writing his own getpixel
    / setpixel functions, thouhg there would be a case for that for a
    geneeral purpose flood fill.

    That would be the "royal we", I presume? I know /I/ would have no use
    for a flood-fill routine that did not support colour styles I use.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Malcolm McLean on Mon Mar 18 17:42:02 2024
    Malcolm McLean <[email protected]> writes:
    On 18/03/2024 16:28, David Brown wrote:
    On 18/03/2024 10:26, Malcolm McLean wrote:

    It is completely normal for correctness proofs to make assumptions about
    things like resources.  An analysis of your code for correctness would
    also generally assume that the heap would be big enough - if the heap
    runs out, your code will not correctly flood-fill the image.  Analysis
    of efficiency in time and space is a separate issue - related, but
    separate.  Things like maximum recursion depth (and heap size) are very
    implementation-specific, and thus need to be considered separately from
    the algorithm itself.


    It's trivial to engineer a system with a large stack and very small
    heap. But unlikley anyone would actually do so on a system on which
    floodfill would run.

    The first sentence is correct. Although with modern systems, 'small'
    is relative (my 12 year old workstation has 16GB RAM) and defaults
    to an 8MB stack, which can easily be increased on a per process or
    per user basis.

    The second is your opinion. What evidence do you have that
    your opinion is fact?

    And while this code is in C, the same algorithm could be implemented in
    other languages.  A language that uses a VM might be fine with a much
    higher recursion depth - or it might be much lower.  A language for
    which recursion is a major tool (such as a functional programming
    language) might automatically convert some recursive code to a
    queue-based non-recursive solution.  (I'd be impressed to see one do
    that for this algorithm, however.)

    Now a 100x100 woked fine an my machine  - I just checked the main
    stack, and it's 8MB by default. BUt of cuuurse the bigger than
    machine, the bigger the image th euser might want to load.


    You still haven't considered using a spell-checker, even though you use
    a news client with one built in?  Perhaps you need a better keyboard?

    I'll try it out. Since you're dyslexic.

    I believe you're conflating David with someone else
    who made that claim.

    Normal readers can read English

    Ah, a not-so-subtle insult to those who happen to suffer from dyslexia.

    text with just the initial and terminal letters right and the rest
    jumbled, at similar speed to normal text.

    Pixels usually represent objects. Take a glance around your. How many
    objects of a similar colour are spider's webs, lace curtains, long
    wires, and so on. And how many are pieces of paper, coffee cups.
    computer mice, and so on? And of course it mustn't fall over on the
    unusual objects, but the main consideration is usually that it is fast
    and efficient on the common ones.

    The main consideration must be that it sufficient, readable and
    maintainable. Fast and efficient aren't always a driving goal,
    particularly for rarely used operations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Scott Lurndal on Mon Mar 18 18:50:40 2024
    On 18/03/2024 17:42, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 18/03/2024 16:28, David Brown wrote:
    On 18/03/2024 10:26, Malcolm McLean wrote:

    It is completely normal for correctness proofs to make assumptions about >>> things like resources.  An analysis of your code for correctness would
    also generally assume that the heap would be big enough - if the heap
    runs out, your code will not correctly flood-fill the image.  Analysis
    of efficiency in time and space is a separate issue - related, but
    separate.  Things like maximum recursion depth (and heap size) are very >>> implementation-specific, and thus need to be considered separately from
    the algorithm itself.


    It's trivial to engineer a system with a large stack and very small
    heap. But unlikley anyone would actually do so on a system on which
    floodfill would run.

    The first sentence is correct. Although with modern systems, 'small'
    is relative (my 12 year old workstation has 16GB RAM) and defaults
    to an 8MB stack, which can easily be increased on a per process or
    per user basis.

    The second is your opinion. What evidence do you have that
    your opinion is fact?

    It seems the most likely. People don't run programs whose sole purpose
    is to floodfill, so that they can request a huge stack.

    It will likely be part of a much larger application with conventional
    stack usage.

    The floodfill may be part of a library, and itself wrapped by another
    library that the application knows about.

    It is even possible that when the application is built, it doesn't know
    that a floodfill routine is to be called. (For example, an interpreter
    that will run a program that /might/ call a floodfill routine.)

    As the author of such a routine, you don't want to have to rely on a
    stack large enough to cope with, say, a 30Mpix image which might need a 30M-deep maximum call-depth, which could easily use up 500MB of memory.
    stack.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Mon Mar 18 11:36:29 2024
    Michael S <[email protected]> writes:

    On Sun, 17 Mar 2024 18:25:20 +0200
    Michael S <[email protected]> wrote:

    On Sun, 17 Mar 2024 14:56:34 +0000
    Malcolm McLean <[email protected]> wrote:

    [a floodfill routine posted by Malcolm]

    [...]

    [a recursive area fill written by Michael S]

    I did my own measurements with snake-like image from my first
    response to Malcolm. For this shape, recursive version (after my improvement) is almost exactly 10 times slower than Malcolm's
    iterative code. And suspect to stack overflow although a little
    less so than original.

    It's hard to write a recursive area fill routine if one wants to
    guarantee worst case behavior in all cases. This problem is not
    a good fit to using recursion without there being some kind of
    constraints on what the inputs will be.

    Even if in Big Oh sense they are the same, it does look like
    Malcolm's variant is decisively faster in practice.

    I've done some tests with Malcolm's code. Some observations:

    It uses more memory than it needs to.

    It's anisotropic, which is to say it behaves differently with
    respect to changes in width than it does to changes in height.

    It doesn't scale well. In particular worst case performance
    scaling is worse than O(N) (as determined experimentally, not
    theoretically).

    The code is much longer than is needed just to do an area fill.
    A small fraction of that is simply layout style, but mostly it's
    that the code is more complicated than it needs to be.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to fir on Mon Mar 18 13:09:08 2024
    fir <[email protected]> writes:

    i was writing simple editor (something like paint but more custom for
    my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to
    do it by hand pixel by pixel and the need to change color of given
    element is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on
    adjacent 4 pixels (only need check if it is in screen at all and if
    the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x, y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
    }

    return 0;
    }

    it work but im not quite sure how to estimate the safety of this - incidentally as i said i use this editor to low res graphics like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but still

    is there maybe simple way to improve it?

    As others have explained using simple recursion like this
    runs the risk of producing a stack overflow.

    Here is a short routine that uses allocated memory rather than
    recursion, and so does not have the stack overflow risk that
    the above recursive routine does.

    The code below uses a slightly different interface to access
    the pixel field but I expect you can see how to adapt it to
    your interface.

    Also the code uses a variably modified type in two places. It
    should be easy to change the code to use ordinary types rather
    than variably modified types if it's important to do that in
    your environment. And it may be the case that changing to use
    a different interface to access and change the pixel field will
    get rid of the variably modified types so that they wouldn't be
    needed anyway.

    Oh, before I forget. If someone doesn't like using a single
    fixed-size allocated area, it isn't hard to change the code
    so that the allocated area grows as needed (and starting with
    a smaller size, presumably). I leave doing that as an
    exercise.

    The code:

    #include <assert.h>

    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color, Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color new ){
    static const Point deltas[4] = { {1,0}, {0,1}, {-1,0}, {0,-1}, };
    Index k = 0;
    Index n = (w+h) *17 /16 +10;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) ) todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
    if( ! change_it( w, h, pixels, q, old, new ) ) continue;
    assert( j > k );
    todo[ k++ ] = q;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color new ){
    if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old ) return 0;
    return pixels[p.x][p.y] = new, 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Mon Mar 18 14:13:19 2024
    Michael S <[email protected]> writes:

    On Mon, 18 Mar 2024 03:00:32 -0700
    Tim Rentsch <[email protected]> wrote:

    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:

    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.

    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    The claim is that recursion always makes programs harder to reason
    about and prove correct. It's easy to find examples that show
    recursion does not always makes programs harder to reason about and
    prove correct.

    I personally find recursion hard work and errors much harder to
    debug.

    Most likely that's because you haven't had the relevant background
    in learning how to program in a functional style. That matches my
    own experience: it was only after learning how to write programs in
    a functional style that I really started to appreciate the benefits
    of using recursion, and to understand how to write and reason about
    recursive programs.

    It is also becomes much more important to show that will not cause
    stack overflow.

    In most cases it's enough to show that the stack depth never exceeds
    log N for an input of size N. I use recursion quite routinely
    without there being any significant danger of stack overflow. It's
    a matter of learning which patterns are safe and which patterns are
    potentially dangerous, and avoiding the dangerous patterns (unless
    certain guarantees can be made to make them safe again).

    The problem in this case is that max. depth of recursion is O(N)
    where N is total number of pixels to change color. So far I
    didn't find an obvious way to cut the worst case by more than
    small factor without turning recursive algorithm into something
    that is unrecognizably different from original and require proof
    of correction of its own. Classic 'divide and conquer smaller
    part first" strategy does not appear applicable here, or at least
    not obviously.

    Right. I said as much in another reply to you. This problem
    is not well suited to a recursive solution.

    To clarify my earlier comment, when I say I routinely use
    recursion I do not mean I always use recursion. Part of
    understanding programming in a functional style is knowing
    when not to use recursion as well as when to use it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to All on Mon Mar 18 22:42:14 2024
    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.


    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color, Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color new ){
    static const Point deltas[4] = { {1,0}, {0,1}, {-1,0}, {0,-1}, };
    UI k = 0;
    UI n = 17;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) ) todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
    if( ! change_it( w, h, pixels, q, old, new ) ) continue;
    todo[ k++ ] = q;
    }

    if( j-k < 3 ){
    Index new_n = n+n/4;
    Index new_j = new_n - (n-j);
    Point *t = realloc( todo, new_n * sizeof *t );
    if( !t ){ k = 0; break; }
    memmove( t + new_j, t + j, (n-j) * sizeof *t );
    todo = t, n = new_n, j = new_j;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color new ){
    if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old ) return 0;
    return pixels[p.x][p.y] = new, 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Malcolm McLean on Mon Mar 18 22:54:37 2024
    Malcolm McLean <[email protected]> writes:

    On 18/03/2024 09:30, Tim Rentsch wrote:

    Michael S <[email protected]> writes:
    [...]
    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through
    diagonal?

    It is customary in raster graphics to count pixels as adjacent
    only if they share an edge, not if they just share a corner.
    Usually that gives better results; the exceptions tend to need
    special handling anyway and not just connecting through
    diagonals.

    Though with a binary image, if the foreground is 4-connected, the
    background must therefore be 8-connected.

    It might be but it doesn't have to be.

    Also different terminology should be used, since 4-connected
    (also N-connected, for other integer N) has a specific meaning in
    graph theory, and one very different than what is meant above.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Malcolm McLean on Mon Mar 18 23:10:21 2024
    Malcolm McLean <[email protected]> writes:

    On 18/03/2024 18:36, Tim Rentsch wrote:


    It doesn't scale well. In particular worst case performance
    scaling is worse than O(N) (as determined experimentally, not
    theoretically).

    Is that because the queue is being memmoved instead of using a
    circular buffer when it gets towards the end?

    I'm sure I don't know, and I'm astonished that you would ask.
    It's your code after all. IMO it should simply be thrown out and
    re-written; it pains me just to look at it, let alone to try to
    understand or fix it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Tue Mar 19 11:32:06 2024
    On 18/03/2024 18:25, Malcolm McLean wrote:
    On 18/03/2024 16:28, David Brown wrote:
    On 18/03/2024 10:26, Malcolm McLean wrote:

    It is completely normal for correctness proofs to make assumptions
    about things like resources.  An analysis of your code for correctness
    would also generally assume that the heap would be big enough - if the
    heap runs out, your code will not correctly flood-fill the image.
    Analysis of efficiency in time and space is a separate issue -
    related, but separate.  Things like maximum recursion depth (and heap
    size) are very implementation-specific, and thus need to be considered
    separately from the algorithm itself.


    It's trivial to engineer a system with a large stack and very small
    heap. But unlikley anyone would actually do so on a system on which
    floodfill would run.

    It is unlikely (that is how the word is spelt) that people would use
    your code for a flood fill either - but I fully agree that it is
    unlikely that a simple recursive flood fill algorithm is much use as a practical way to do flood fills in any real-world software.


    And while this code is in C, the same algorithm could be implemented
    in other languages.  A language that uses a VM might be fine with a
    much higher recursion depth - or it might be much lower.  A language
    for which recursion is a major tool (such as a functional programming
    language) might automatically convert some recursive code to a
    queue-based non-recursive solution.  (I'd be impressed to see one do
    that for this algorithm, however.)

    Now a 100x100 woked fine an my machine  - I just checked the main
    stack, and it's 8MB by default. BUt of cuuurse the bigger than
    machine, the bigger the image th euser might want to load.


    You still haven't considered using a spell-checker, even though you
    use a news client with one built in?  Perhaps you need a better keyboard? >>
    I'll try it out. Since you're dyslexic. Normal readers can read English
    text with just the initial and terminal letters right and the rest
    jumbled, at similar speed to normal text.

    You are fond of making up "Malcolm facts" about the cognitive effort
    involved in understanding things like nested parentheses. Poor
    spelling, typos, grammatical errors, and the like similarly increase the cognitive effort in reading your posts. When it takes too much effort
    to figure out what you are trying to say, it is not worth the bother.

    I am not looking for perfection here - mistakes happen. I am merely
    looking for a minimum of effort on your part, such as using the
    spell-checker built into your newsreader. And I don't expect you to do
    this for /me/, or because I or anyone else is dyslexic. I consider it a
    basic level of politeness and respect for others. I find it
    extraordinary that you have been so reluctant to take this step before now.


    The worst case is either going to be the stripy path example given by
    Michael S., or a completely blank image - it depends on how the
    east-west stripes affect the queue depth.  It should not be hard to
    try these.  So that would be either approximately half the total pixel
    count, or the total pixel count.  And I can't think how you could
    specify a "typical" image and "typical" flood fill request - without
    specifying this in some way, you need to collect lots of statistics of
    real-world use, or it's mere guesswork.

    Pixels usually represent objects. Take a glance around your. How many
    objects of a similar colour are spider's webs,  lace curtains, long
    wires, and so on. And how many are pieces of paper, coffee cups.
    computer mice, and so on? And of course it mustn't fall over on the
    unusual objects, but the main consideration is usually that it is fast
    and efficient on the common ones.

    There is nothing that I or anyone else can see that could possibly be considered a "typical" image - though there are clearly things that are commonly seen. And simple colour-matching flood fills are totally
    pointless on any real image (photographs, realistic renderings, etc.).


    But not always of course, Sometimes results must in in under 0.1
    seconds, and so 0.09 is as good as 0.01, but 0.11 on a rare spider's web
    is catastrophic.

    I cannot imagine the situation where you would have a hard real-time
    limit of 0.1 seconds to do a simplistic flood fill on a rare spider's
    web (or picture thereof).


    I think it would suffice to test the code on a few worst-case samples,
    and a few examples of images you have yourself that you need to
    flood-fill. If the speed is good enough on your computer during such
    tests, that would be all you need. You are not making a real-world
    reusable graphics library or a serious image manipulation tool here.



    That would be the "royal we", I presume?  I know /I/ would have no use
    for a flood-fill routine that did not support colour styles I use.


    This routine is part of the binary image processing library, so of
    course it is written to be easy to use with binary images, or binary
    images which have been processed and are no longer strictly binary
    images. But if people want to take it and use it as pattern for a
    general flood fill, then of course I'm perfectly happy that they have
    found the code to be of use.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Scott Lurndal on Tue Mar 19 11:41:01 2024
    On 18/03/2024 18:42, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 18/03/2024 16:28, David Brown wrote:

    You still haven't considered using a spell-checker, even though you use
    a news client with one built in?  Perhaps you need a better keyboard?

    I'll try it out. Since you're dyslexic.

    I believe you're conflating David with someone else
    who made that claim.

    I did, in another post, say that I am mildly dyslexic. The context was
    that I understand spelling can be difficult - my spelling, without a spell-checker, is often terrible. But my reading level is very high.

    I expect most people here can figure out the words Malcolm meant to type
    when he fails to press the right keys. But we should not have to do so.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Tue Mar 19 12:31:05 2024
    On 18/03/2024 19:10, Keith Thompson wrote:
    Malcolm McLean <[email protected]> writes:
    On 18/03/2024 16:28, David Brown wrote:
    [...]
    You still haven't considered using a spell-checker, even though you
    use a news client with one built in?  Perhaps you need a better
    keyboard?

    I'll try it out. Since you're dyslexic. Normal readers can read
    English text with just the initial and terminal letters right and the
    rest jumbled, at similar speed to normal text.

    I will not speculate about why you seem to be unaware that calling
    someone dyslexic is insulting, both to the person you're addressing and
    to people with dyslexia.

    You need to stop making disparaging personal comments.


    I think Malcolm is truly unaware of how these kinds of comments could be
    taken.

    To be clear here, I did mention in another post that I am dyslexic. So
    he was not saying it out of the blue.

    However, it does not seem that he has a very good idea of what dyslexia,
    in all its forms and variations, actually is. My dyslexia does not
    affect my reading at all (as far as any measurements have ever shown),
    but it affects my spelling quite a lot. (It has other effects too, but
    we don't need to cover everything here.)

    (Malcolm's comment about "normal readers" reading jumbled text has a
    grain of truth to it, but not much more than a grain.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Tue Mar 19 13:18:42 2024
    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.


    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color
    new ){ static const Point deltas[4] = { {1,0}, {0,1}, {-1,0},
    {0,-1}, }; UI k = 0;
    UI n = 17;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) )
    todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
    if( ! change_it( w, h, pixels, q, old, new ) )
    continue; todo[ k++ ] = q;
    }

    if( j-k < 3 ){
    Index new_n = n+n/4;
    Index new_j = new_n - (n-j);
    Point *t = realloc( todo, new_n * sizeof *t );
    if( !t ){ k = 0; break; }
    memmove( t + new_j, t + j, (n-j) * sizeof *t );
    todo = t, n = new_n, j = new_j;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color
    new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old )
    return 0; return pixels[p.x][p.y] = new, 1;
    }

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.
    Is it the same algorithm?

    Besides, I don't think that use of VLA in library code is a good idea.
    VLA is optional in latest C standards. And incompatible with C++.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to David Brown on Tue Mar 19 12:19:47 2024
    On 19/03/2024 10:41, David Brown wrote:

    I expect most people here can figure out the words Malcolm meant to type
    when he fails to press the right keys.  But we should not have to do so.


    So ... the poster should make the effort once, rather than the 1000s of
    readers should be made to make the effort 1000s of times.

    This is usenet etiquette 101.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter 'Shaggy' Haywood@21:1/5 to All on Tue Mar 19 10:57:55 2024
    Groovy hepcat Lew Pitcher was jivin' in comp.lang.c on Mon, 18 Mar 2024
    01:27 am. It's a cool scene! Dig it.

    On 16/03/2024 04:11, fir wrote:

    [Snip.]

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x,  y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y,  old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y,  old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1,  old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1,  old_color, new_color);
    return 1;
    }

    return 0;
    }

    [Snippity doo dah.]

    Take fir's example code above; a simple single call to RecolorizePixelAndAdjacentOnes() will effectively recolour the
    origin cell multiple times, because of how the recursion is handled.

    No, I don't think so. You seem to have missed the fact that it checks
    the colour of the "current" pixel, and only continues (setting new
    colour & recursing) if it is the old colour.
    Of course, I'm infering (guessing) the functionality, at least
    partially (Unsafe? Safe?), of GetPixelUnsafe() and SetPixelSafe() based
    on their names.

    [Snip Lew's examples.]

    --


    ----- Dig the NEW and IMPROVED news sig!! -----


    -------------- Shaggy was here! ---------------
    Ain't I'm a dawg!!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Tue Mar 19 15:49:00 2024
    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:


    No. Mine takes horizontal scan lines and extends them, then places
    the pixels above and below in a queue to be considered as seeds for
    the next scan line. (It's not mine, but I don't know who invented it.
    It wasn't me.)

    Tim, now what does it do? Essentially it's the recursive fill
    algorithm but with the data only on the stack instead of the call and
    the data. And todo is actually a queue rather than a stack.

    Now why would it be slower? Probaby because you usually only hit a
    pixel three times with mine - once below, once above, and once for
    the scan line itself, whilst you consider it 5 times for Tim's - once
    for each neighbour and once for itself. Then horizontally adjacent
    pixels are more likely to be in the same cache line than vertically
    adjacent pixels, so processing images in scan lines tends to be a bit
    faster.


    Below is a variant of recursive algorithm that is approximately as
    fast as your code (1.25x faster for filling solid rectangle, 1.43x
    slower for filling snake shape).
    The code is a bit long, but I hope that the logic is still obvious and
    there is no need to prove correctness.
    I have a micro-optimized variant of the same algorithm that is as fast
    or faster than yours in all cases that I tested, but posting
    micro-optimized code on c.l.c is a bad sportsmanship.
    Recursion depth of this algorithm for typical solid shape is O(max(width,height)), but for a worst case it still very bad, about N/4.
    And since there are more local variable to preserve, the worst case
    size of occupied stack is likely even bigger than in simple (but
    non-naive) recursion. So, while fast, I wouldn't use this algorithm in general-purpose library.
    But it can serve as a reference point for implementation with explicit
    stack.


    struct recursive_context_t {
    unsigned char *grey;
    int width, height;
    unsigned char target, dest;
    };

    static void floodfill_r_core(const struct recursive_context_t* context,
    int x, int y);

    int floodfill_r(
    unsigned char *grey,
    int width, int height,
    int x, int y,
    unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    struct recursive_context_t context = {
    .grey = grey,
    .width = width,
    .height = height,
    .target = target,
    .dest = dest,
    };
    floodfill_r_core(&context, x, y);
    return 1;
    }

    static void floodfill_r_core(const struct recursive_context_t* context,
    int x, int y) {
    // point (x,y) is in target rectangle and has target color. It's
    guaranteed by caller

    // Find maximal cross (of Saint George's variety) with target color
    and center at (x,y) // go left
    int x0;
    for (x0 = x-1; x0 >= 0 &&
    context->grey[y*context->width+x0] == context->target; --x0); ++x0;
    // go right
    int x1;
    for (x1 = x+1; x1 < context->width &&
    context->grey[y*context->width+x1] == context->target; ++x1); // go up
    int y0;
    for (y0 = y-1; y0 >= 0 &&
    context->grey[y0*context->width+x] == context->target; --y0); ++y0;
    // go down
    int y1;
    for (y1 = y+1; y1 < context->height &&
    context->grey[y1*context->width+x] == context->target; ++y1);

    // Fill cross with destination color
    for (int i = x0; i < x1; ++i)
    context->grey[y*context->width+i] = context->dest;
    for (int i = y0; i < y1; ++i)
    context->grey[i*context->width+x] = context->dest;

    if (y > 0) { // recursion into points above horizontal line
    unsigned char *row = &context->grey[(y-1)*context->width];
    for (int i = x0; i < x1; ++i)
    if (row[i] == context->target)
    floodfill_r_core(context, i, y-1);
    }
    if (y+1 < context->height) { // recursion into points below
    horizontal line unsigned char *row =
    &context->grey[(y+1)*context->width]; for (int i = x0; i < x1; ++i)
    if (row[i] == context->target)
    floodfill_r_core(context, i, y+1);
    }
    if (x > 0) { // recursion into points left of vertical line
    unsigned char *col = &context->grey[x-1];
    for (int i = y0; i < y1; ++i)
    if (col[i*context->width] == context->target)
    floodfill_r_core(context, x-1, i);
    }
    if (x+1 < context->width) { // recursion into points right of
    vertical line unsigned char *col = &context->grey[x+1];
    for (int i = y0; i < y1; ++i)
    if (col[i*context->width] == context->target)
    floodfill_r_core(context, x+1, i);
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Tue Mar 19 16:05:50 2024
    Michael S wrote:
    On Mon, 18 Mar 2024 03:00:32 -0700
    Tim Rentsch <[email protected]> wrote:

    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:

    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.


    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    The claim is that recursion always makes programs harder to reason
    about and prove correct. It's easy to find examples that show
    recursion does not always makes programs harder to reason about and
    prove correct.

    I personally find recursion hard work and errors much harder to
    debug.

    Most likely that's because you haven't had the relevant background
    in learning how to program in a functional style. That matches my
    own experience: it was only after learning how to write programs in
    a functional style that I really started to appreciate the benefits
    of using recursion, and to understand how to write and reason about
    recursive programs.

    It is also becomes much more important to show that will not cause
    stack overflow.

    In most cases it's enough to show that the stack depth never exceeds
    log N for an input of size N. I use recursion quite routinely
    without there being any significant danger of stack overflow. It's
    a matter of learning which patterns are safe and which patterns are
    potentially dangerous, and avoiding the dangerous patterns (unless
    certain guarantees can be made to make them safe again).

    The problem in this case is that max. depth of recursion is O(N) where N
    is total number of pixels to change color. So far I didn't find an
    obvious way to cut the worst case by more than small factor without
    turning recursive algorithm into something that is unrecognizably
    different from original and require proof of correction of its own.
    Classic 'divide and conquer smaller part first" strategy does not
    appear applicable here, or at least not obviously.

    in reality it is less i guess..
    well that would be like if i would like to recolor
    vertical line of say length 2 milion pixels
    - i would go always one pixel right 2 milion times

    if this is 100x 100 square and i put the initioation
    in middle it would go 50x right then at depth 50
    it would go one up than i guess 100 times left

    then just about this line up until up edge of picture
    - then it probably revert back (with a lot
    of false is) to first line and then go down

    - so it seems (though i was not checkingh it
    tu much in my head) that the depth in that case
    would be about half

    - but this is becouse its much unfortunate,
    'normally' i think the recursion depth
    should be more like to edge of an area

    (i will answer more later as i hate usenet by newsreader
    so unconveniant to read and answer its pain)

    the problem has a couple of aspects imo
    - interesting is in fact the great simplicity
    of this recursion method esp in that case - which gives to think

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to fir on Tue Mar 19 17:16:42 2024
    On 19/03/2024 15:05, fir wrote:
    Michael S wrote:
    On Mon, 18 Mar 2024 03:00:32 -0700
    Tim Rentsch <[email protected]> wrote:

    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:

    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.


    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    The claim is that recursion always makes programs harder to reason
    about and prove correct.  It's easy to find examples that show
    recursion does not always makes programs harder to reason about and
    prove correct.

    I personally find recursion hard work and errors much harder to
    debug.

    Most likely that's because you haven't had the relevant background
    in learning how to program in a functional style.  That matches my
    own experience:  it was only after learning how to write programs in
    a functional style that I really started to appreciate the benefits
    of using recursion, and to understand how to write and reason about
    recursive programs.

    It is also becomes much more important to show that will not cause
    stack overflow.

    In most cases it's enough to show that the stack depth never exceeds
    log N for an input of size N.  I use recursion quite routinely
    without there being any significant danger of stack overflow.  It's
    a matter of learning which patterns are safe and which patterns are
    potentially dangerous, and avoiding the dangerous patterns (unless
    certain guarantees can be made to make them safe again).

    The problem in this case is that max. depth of recursion is O(N) where N
    is total number of pixels to change color. So far I didn't find an
    obvious way to cut the worst case by more than small factor without
    turning recursive algorithm into something that is unrecognizably
    different from original and require proof of correction of its own.
    Classic 'divide and conquer smaller part first" strategy does not
    appear applicable here, or at least not obviously.

    in reality it is less i guess..
    well that would be like if i would like to recolor
    vertical line of say length 2 milion pixels
    - i would go always one pixel right 2 milion times

    if this is 100x 100 square and i put the initioation
    in middle it would go 50x right then at depth 50
    it would go one up than i guess 100 times left

    then just about this line up until up edge of picture
    - then it probably revert back (with a lot
    of false is) to first line and then go down

    That's what I thought until I tried it.

    If I start with an 18x18 image of all zeros, then fill starting from the
    centre with a 'colour' that is an incrementing value, then the final
    image displayed as a table of integers looks like this:


    171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
    136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
    135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
    100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
    99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
    64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
    63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
    28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
    27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
    172 173 174 175 176 177 178 179 180 1 2 3 4 5 6 7 8 9
    209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
    208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
    217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
    252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
    253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
    288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
    289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
    324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

    By following the sequence starting from 1, you can see the fill-pattern.

    It's not clear how it gets from 171 at top left to 172 half-way down the
    left edge.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Tue Mar 19 19:18:59 2024
    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:

    On 19/03/2024 11:18, Michael S wrote:
    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.


    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
    Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
    {-1,0}, {0,-1}, }; UI k = 0;
    UI n = 17;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) )
    todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y
    }; if( ! change_it( w, h, pixels, q, old, new ) )
    continue; todo[ k++ ] = q;
    }

    if( j-k < 3 ){
    Index new_n = n+n/4;
    Index new_j = new_n - (n-j);
    Point *t = realloc( todo, new_n * sizeof *t );
    if( !t ){ k = 0; break; }
    memmove( t + new_j, t + j, (n-j) * sizeof *t );
    todo = t, n = new_n, j = new_j;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
    Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
    old ) return 0; return pixels[p.x][p.y] = new, 1;
    }

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.
    Is it the same algorithm?


    No. Mine takes horizontal scan lines and extends them, then places
    the pixels above and below in a queue to be considered as seeds for
    the next scan line. (It's not mine, but I don't know who invented it.
    It wasn't me.)

    Tim, now what does it do? Essentially it's the recursive fill
    algorithm but with the data only on the stack instead of the call and
    the data. And todo is actually a queue rather than a stack.

    Now why would it be slower? Probaby because you usually only hit a
    pixel three times with mine - once below, once above, and once for
    the scan line itself, whilst you consider it 5 times for Tim's - once
    for each neighbour and once for itself. Then horizontally adjacent
    pixels are more likely to be in the same cache line than vertically
    adjacent pixels, so processing images in scan lines tends to be a bit
    faster.


    I did a little more investigation gradually modifying Tim's code for
    improved performance without changing the basic principle of the
    algorithm. Yes, micro-optimization. Yes, I said earlier that doing so
    in c.l.c it is bad sportsmanship. So what? I never claimed to be an
    ideal sportsman.
    The point is that after optimizations it's actually faster than the
    best implementations of original recursive algorithm, including
    implementation that uses explicit stack and is quite economical in its
    memory consumption. Tim's algorithm is 8 times less economical (8 bytes
    per saved node vs 1 byte in explicit stack) and nevertheless almost
    twice faster for both shapes that I was testing.
    So far, this algorithm is fastest among all "local" algorithms that I
    tried. By "local" I mean algorithms that don't try to recolor more than
    one pixel at time.
    "Non-local" algorithms i.e. yours and my recursive algorithm that
    recolors St. George cross, are somewhat faster, but I suspect that
    it's because all shapes that I use for testing have either long
    columns or long rows or both.
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost nothing
    else.
    The second nice thing is that it is easy to understand. Not as easy as
    original recursive method, but easier than the rest of them.

    If you or somebody else is interested, here is [micro]optimized variant:


    #include <stdlib.h>
    #include <stddef.h>
    #include <string.h>


    typedef unsigned char Color;
    typedef int UI;
    typedef struct { UI x, y; } Point;

    static inline
    Point* circularIncr(Point* p, Point* beg, Point* end) {
    return p + 1 == end ? beg : p + 1;
    }

    static inline
    Point mk_point(int x, int y) {
    Point pt={x,y};
    return pt;
    }

    int floodfill_r(
    Color *pixels,
    int w, int h,
    int pt0_x, int pt0_y,
    Color old, Color new)
    {
    if (pt0_x < 0 || pt0_x >= w || pt0_y < 0 || pt0_y >= h)
    return 0;

    if (pixels[pt0_y*w+pt0_x] != old)
    return 0;

    pixels[pt0_y*w+pt0_x] = new;

    const ptrdiff_t INITIAL_TODO_SIZE = 125;
    Point *todo = malloc( (INITIAL_TODO_SIZE+3) * sizeof *todo );
    // +3 is extra size to assist wrap-around of wr
    if (!todo)
    return -1;
    Point* todo_end = &todo[INITIAL_TODO_SIZE];

    todo[0] = mk_point(pt0_x, pt0_y);
    Point* wr = &todo[1];
    Point* rd = todo;
    ptrdiff_t free_space = INITIAL_TODO_SIZE - 1;
    do {
    Point pt = *rd;
    rd = circularIncr(rd, todo, todo_end);
    Point* prev_wr = wr;
    if (pt.x > 0 && pixels[pt.y*w+pt.x-1] == old) {
    pixels[pt.y*w+pt.x-1] = new;
    *wr++ = mk_point(pt.x-1, pt.y);
    }
    if (pt.y > 0 && pixels[pt.y*w+pt.x-w] == old) {
    pixels[pt.y*w+pt.x-w] = new;
    *wr++ = mk_point(pt.x, pt.y-1);
    }
    if (pt.x+1 < w && pixels[pt.y*w+pt.x+1] == old) {
    pixels[pt.y*w+pt.x+1] = new;
    *wr++ = mk_point(pt.x+1, pt.y);
    }
    if (pt.y+1 < h && pixels[pt.y*w+pt.x+w] == old) {
    pixels[pt.y*w+pt.x+w] = new;
    *wr++ = mk_point(pt.x, pt.y+1);
    }

    free_space += 1 - (wr - prev_wr);
    if (wr >= todo_end) {
    memcpy(todo, todo_end, (wr - todo_end)*sizeof(*wr));
    wr += todo - todo_end;
    }

    if (free_space < 4) {
    ptrdiff_t rdi = rd-todo;
    ptrdiff_t wri = wr-todo;
    ptrdiff_t sz = todo_end - todo;
    ptrdiff_t incr = sz/4;
    Point* new_todo = realloc(todo, (sz+incr+3) * sizeof *todo );
    // +3 is extra size to assist wrap-around of wr
    if(!new_todo) {
    free(todo);
    return -1;
    }
    free_space += incr;
    rd = &new_todo[rdi];
    wr = &new_todo[wri];
    todo = new_todo;
    todo_end = &todo[sz+incr];
    if (rd >= wr) {
    memmove(&rd[incr], rd, (sz-rdi) * sizeof *todo );
    rd = &rd[incr];
    }
    }
    } while (rd != wr);

    free( todo );
    return 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to bart on Tue Mar 19 17:33:56 2024
    On 19/03/2024 17:16, bart wrote:
    On 19/03/2024 15:05, fir wrote:

    if this is 100x 100 square and i put the initioation
    in middle it would go 50x right then at depth 50
    it would go one up than i guess 100 times left

    then just about this line up until up edge of picture
    - then it probably revert back (with a lot
    of false is) to first line and then go down

    That's what I thought until I tried it.

    If I start with an 18x18 image of all zeros, then fill starting from the centre with a 'colour' that is an incrementing value, then the final
    image displayed as a table of integers looks like this:


    171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
    136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
    135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
    100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
     99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84  83  82
     64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81
     63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47  46
     28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45
     27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10
    172 173 174 175 176 177 178 179 180   1   2   3   4   5   6   7   8   9
    209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
    208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
    217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
    252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
    253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
    288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
    289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
    324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

    It's not clear how it gets from 171 at top left to 172 half-way down the
    left edge.

    Actually, a more revealing picture is produced when storing the
    calldepth in each cell rather than sequence number (these images are now 16-bits/cell rather than 8):

    171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
    136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
    135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
    100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
    99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
    64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
    63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
    28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
    27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
    28 29 30 31 32 33 34 35 36 1 2 3 4 5 6 7 8 9
    65 66 67 68 69 70 71 72 37 38 39 40 41 42 43 44 45 46
    64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47
    65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
    100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83
    101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
    136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119
    137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
    172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155

    At least now I know how it gets from the top left to the 172 in the top
    image: it must do a cascade of Returns until it gets back to call-depthe
    27 in this second chart, then it does the cell immediately below.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to bart on Tue Mar 19 23:07:30 2024
    bart wrote:
    On 19/03/2024 15:05, fir wrote:
    Michael S wrote:
    On Mon, 18 Mar 2024 03:00:32 -0700
    Tim Rentsch <[email protected]> wrote:

    bart <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:

    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct. >>>>>>>

    Are you prepared to offer any evidence to support this astonishing >>>>>> statement or can we just assume it's another Malcolmism?

    You have evidence to suggest that the opposite is true?

    The claim is that recursion always makes programs harder to reason
    about and prove correct. It's easy to find examples that show
    recursion does not always makes programs harder to reason about and
    prove correct.

    I personally find recursion hard work and errors much harder to
    debug.

    Most likely that's because you haven't had the relevant background
    in learning how to program in a functional style. That matches my
    own experience: it was only after learning how to write programs in
    a functional style that I really started to appreciate the benefits
    of using recursion, and to understand how to write and reason about
    recursive programs.

    It is also becomes much more important to show that will not cause
    stack overflow.

    In most cases it's enough to show that the stack depth never exceeds
    log N for an input of size N. I use recursion quite routinely
    without there being any significant danger of stack overflow. It's
    a matter of learning which patterns are safe and which patterns are
    potentially dangerous, and avoiding the dangerous patterns (unless
    certain guarantees can be made to make them safe again).

    The problem in this case is that max. depth of recursion is O(N) where N >>> is total number of pixels to change color. So far I didn't find an
    obvious way to cut the worst case by more than small factor without
    turning recursive algorithm into something that is unrecognizably
    different from original and require proof of correction of its own.
    Classic 'divide and conquer smaller part first" strategy does not
    appear applicable here, or at least not obviously.

    in reality it is less i guess..
    well that would be like if i would like to recolor
    vertical line of say length 2 milion pixels
    - i would go always one pixel right 2 milion times

    if this is 100x 100 square and i put the initioation
    in middle it would go 50x right then at depth 50
    it would go one up than i guess 100 times left

    then just about this line up until up edge of picture
    - then it probably revert back (with a lot
    of false is) to first line and then go down

    That's what I thought until I tried it.

    If I start with an 18x18 image of all zeros, then fill starting from the centre with a 'colour' that is an incrementing value, then the final
    image displayed as a table of integers looks like this:


    171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
    136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
    135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
    100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
    99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
    64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
    63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
    28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
    27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
    172 173 174 175 176 177 178 179 180 1 2 3 4 5 6 7 8 9
    209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
    208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
    217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
    252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
    253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
    288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
    289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
    324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

    By following the sequence starting from 1, you can see the fill-pattern.

    It's not clear how it gets from 171 at top left to 172 half-way down the
    left edge.


    well its exactly what i said and is clear imo, whats not clear - simply
    it was developing "try right, try left, try up, try down" untill
    clasjhes with up edge then gets beck to level one depth and continues
    but now until meets down edge - but fine you tested it

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Malcolm McLean on Tue Mar 19 23:23:48 2024
    Malcolm McLean wrote:
    On 16/03/2024 18:21, Scott Lurndal wrote:
    Malcolm McLean <[email protected]> writes:
    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.

    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?


    Example given. A recursive algorithm which is hard to reason about and

    Perhaps hard for _you_ to reason about. That doesn't
    generalize to every other programmer that might read that
    code.


    From experience this one blows the stack, but not always. Sometimes
    it's OK to use.

    Since you can reason about it so easily, you can tell the others when
    you're OK and when you are not, in a handy intuitive way so that someone thinking of implementing it will know.

    from "effective c" or maybe i call it "optimal" point of view
    recursion as it is implementeded in c is wrong (it is sub-optimel)
    so i agree with that statement - hovever a big dose of world today
    programs in a non optimal way (for example whole that python ond other programing ways is non optimal)..and thus te recursion may be found one way

    ..and i must say whan i way younger i was much more hardfixed to optimal coding..today i almost no code at all and lost a interest (i mean
    "being interested") in many things ..i also consider that
    non optimal cases more interesting - and generallt this recursion would
    be standable - especialy if for example windows has this "guard page"
    some exception based mechanism that would allow to resize stack
    up its default two megabytes (which is a bit small size as for today)
    instead of crashing application

    im not sure hovever if its done and if its posoble as i understand there
    is exception when tryibgf to read write immediatelly after stack reserve
    but not quite if someone just moves up stack pointer (like when
    allocating stack space for array) but those machanisms could be handy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to David Brown on Tue Mar 19 23:52:30 2024
    David Brown wrote:
    rks, especially if the flood-fill is on live displayed data rather than
    in a buffer off-screen. But typically you need to get a /lot/ more
    advanced (i.e., not your algorithm) to improve on the OP's version by an order of magnitude, so if speed is not essential but understanding that
    it is correct

    this code of my was a most fast implementation when i was needed to test something in 3 minutes as the efect looks good
    (i wrote an editor for low resolution drawing when i select the
    given color piece then if selected by pressing control or shift
    and moving mouse i recolorise this component in fluid way -

    - it ios good becouse you may see which colors fits to other colors and
    some editors liek paint dont allow that this to compose some image with
    fitting colors you got much much harder amount of work)


    btw this is seen its written as adhoc solution becouse from
    optimistation point of view apssing old_color and new_color
    wchich are always teh same (like passing them in whole branch
    potential milion times) is nonsense - but this "branch"
    (as i wouldnt call it function, its ratcher brabjc - need
    that data.. and if not passing it as args i would need to make
    standalone variables

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 00:03:04 2024
    Michael S wrote:
    On Sun, 17 Mar 2024 14:56:34 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 15:09, Malcolm McLean wrote:
    On 16/03/2024 14:40, David Brown wrote:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern.
    But not sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c


    Your implementation is a mess, /vastly/ more difficult to prove
    correct than the OP's original one, and unlikely to be very much
    faster (it will certainly scale in the same way in both time and
    memory usage).

    Now is this David Brown being David Borwn, ot its it actaully ture?


    And I need to run some tests, don't I?

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>

    int floodfill_r(unsigned char *grey, int width, int height, int x,
    int y, unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    grey[y*width+x] = dest;
    floodfill_r(grey, width, height, x - 1, y, target, dest);
    floodfill_r(grey, width, height, x + 1, y, target, dest);
    floodfill_r(grey, width, height, x, y - 1, target, dest);
    floodfill_r(grey, width, height, x, y + 1, target, dest);

    return 0;
    }

    /**
    Floodfill4 - floodfill, 4 connectivity.

    @param[in,out] grey - the image (formally it's greyscale but it
    could be binary or indexed)
    @param width - image width
    @param height - image height
    @param x - seed point x
    @param y - seed point y
    @param target - the colour to flood
    @param dest - the colur to replace it by.
    @returns Number of pixels flooded.
    */
    int floodfill4(unsigned char *grey, int width, int height, int x, int
    y, unsigned char target, unsigned char dest)
    {
    int *qx = 0;
    int *qy = 0;
    int qN = 0;
    int qpos = 0;
    int qcapacity = 0;
    int wx, wy;
    int ex, ey;
    int tx, ty;
    int ix;
    int *temp;
    int answer = 0;

    if(grey[y * width + x] != target)
    return 0;
    qx = malloc(width * sizeof(int));
    qy = malloc(width * sizeof(int));
    if(qx == 0 || qy == 0)
    goto error_exit;
    qcapacity = width;
    qx[qpos] = x;
    qy[qpos] = y;
    qN = 1;

    while(qN != 0)
    {
    tx = qx[qpos];
    ty = qy[qpos];
    qpos++;
    qN--;

    if(qpos == 256)
    {
    memmove(qx, qx + 256, qN*sizeof(int));
    memmove(qy, qy + 256, qN*sizeof(int));
    qpos = 0;
    }
    if(grey[ty*width+tx] != target)
    continue;
    wx = tx;
    wy = ty;
    while(wx >= 0 && grey[wy*width+wx] == target)
    wx--;
    wx++;
    ex = tx;
    ey = ty;
    while(ex < width && grey[ey*width+ex] == target)
    ex++;
    ex--;


    for(ix=wx;ix<=ex;ix++)
    {
    grey[ty*width+ix] = dest;
    answer++;
    }

    if(ty > 0)
    for(ix=wx;ix<=ex;ix++)
    {
    if(grey[(ty-1)*width+ix] == target)
    {
    if(qpos + qN == qcapacity)
    {
    temp = realloc(qx, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qx = temp;
    temp = realloc(qy, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qy = temp;
    qcapacity += width;
    }
    qx[qpos+qN] = ix;
    qy[qpos+qN] = ty-1;
    qN++;
    }
    }
    if(ty < height -1)
    for(ix=wx;ix<=ex;ix++)
    {
    if(grey[(ty+1)*width+ix] == target)
    {
    if(qpos + qN == qcapacity)
    {
    temp = realloc(qx, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qx = temp;
    temp = realloc(qy, (qcapacity + width) * sizeof(int));
    if(temp == 0)
    goto error_exit;
    qy = temp;
    qcapacity += width;
    }
    qx[qpos+qN] = ix;
    qy[qpos+qN] = ty+1;
    qN++;
    }
    }
    }

    free(qx);
    free(qy);

    return answer;
    error_exit:
    free(qx);
    free(qy);
    return -1;
    }

    int main(void)
    {
    unsigned char *image;
    clock_t tick, tock;
    int i;

    image = malloc(100 * 100);
    tick = clock();
    for (i = 0 ; i < 10000; i++)
    {
    memset(image, 0, 100 * 100);
    floodfill_r(image, 100, 100, 50, 50, 0, 1);
    }
    tock = clock();
    printf("floodfill_r %g\n", ((double)(tock -
    tick))/CLOCKS_PER_SEC);

    tick = clock();
    for (i = 0 ; i < 10000; i++)
    {
    memset(image, 0, 100 * 100);
    floodfill4(image, 100, 100, 50, 50, 0, 1);
    }
    tock = clock();
    printf("floodfill4 %g\n", ((double)(tock - tick))/CLOCKS_PER_SEC);

    return 0;
    }


    Let's give it a whirl

    malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c
    malcolm@Malcolms-iMac cscratch % ./a.out
    floodfill_r 1.69274
    floodfill4 0.336705



    I find your performance measurement non-decisive for two reasons:
    (1) because your test case is too trivial and probably uncharacteristic
    and
    (2) because recursive variant could be trivially rewritten in a way
    that reduces # of stack memory accesses by factor of 2 or 3.
    Like that:

    struct recursive_context_t {
    unsigned char *grey;
    int width, height;
    unsigned char target, dest;
    };

    static void floodfill_r_core(const struct recursive_context_t* context,
    int x, int y) {
    if (x < 0 || x >= context->width || y < 0 || y >= context->height)
    return;
    if (context->grey[y*context->width+x] == context->target) {
    context->grey[y*context->width+x] = context->dest;
    floodfill_r_core(context, x - 1, y);
    floodfill_r_core(context, x + 1, y);
    floodfill_r_core(context, x, y - 1);
    floodfill_r_core(context, x, y + 1);
    }
    }

    int floodfill_r(
    unsigned char *grey,
    int width, int height,
    int x, int y,
    unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    struct recursive_context_t context = {
    .grey = grey,
    .width = width,
    .height = height,
    .target = target,
    .dest = dest,
    };
    floodfill_r_core(&context, x, y);
    return 1;
    }


    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 00:30:56 2024
    Michael S wrote:
    On Wed, 20 Mar 2024 00:03:04 +0100
    fir <[email protected]> wrote:
    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }



    Not thred-safe.

    some thread safe as previous, and i just say that thiose new_color and old_color in arguments i add for convenience - as those all was
    functional test if the operation of recolorisation visibly works and how
    in my lowres (bigpixel) graphics editor - its no need to pass it
    down the stack..also the test if old_color = new color then return
    is strictly probably not needed to populate

    i also made unnecesary GetPixelSafe - i use two metods liek SetPixelSafe
    just checks if x,y is in array at all ans SetPixelUnsafe
    simply frame[y*frame_width+x] = color

    so if you were interested in speed comparsions you wouldnt need to pass structure at all and that will be faster

    i agree generally with bot mclean and brown in this discusion
    1) its not optimal so its kinda wrong
    2) it is simple so its kinda usable and for some uses its handy so
    not all accusations of this being wring are justified

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to fir on Wed Mar 20 01:17:59 2024
    On Wed, 20 Mar 2024 00:03:04 +0100
    fir <[email protected]> wrote:
    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }



    Not thred-safe.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 01:13:10 2024
    Michael S wrote:
    On Sat, 16 Mar 2024 11:33:20 +0000
    Malcolm McLean <[email protected]> wrote:

    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom
    for my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area
    of of one color into another color (becouse if no someone would
    need to do it by hand pixel by pixel and the need to change color
    of given element is very common)

    there is very simple method of doing it - i men i click in given
    color pixel then replace it by my color and call the same function
    on adjacent 4 pixels (only need check if it is in screen at all and
    if the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
    old_color, unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x, y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
    }

    return 0;
    }

    it work but im not quite sure how to estimate the safety of this -
    incidentally as i said i use this editor to low res graphics like
    200x200 pixels or less, and it is only a toll of private use,
    yet i got no time to work on it more than 1-2-3 days i guess but
    still

    is there maybe simple way to improve it?
    >
    This is a cheap and cheerful fllod fill. And it's easy to get right
    and shouldn't afall over.

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through diagonal?



    this is right remark..i simply not thought on it..but thiose are kinda
    details i just my modify the function if i would notice i need the
    diagonally conected

    note how the topic was born : i was writing the editor, the simple
    editor is a work of 1-2 days of code - in here the "recolorisation
    of selected (by mouse click) area" is a 30 minutes try then i go further

    i asked the topic here as i felt i got no time to rethink if it will
    blow my progranm or not but that 30 minurtes task was for 30 minutes
    not for a multi hour discusion

    hovever i often like to post that some piece of coding to turn into
    multi-hpour discusiion to get a bigger ground on some things then coding
    become more solid

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to bart on Wed Mar 20 01:48:54 2024
    bart wrote:
    On 16/03/2024 04:11, fir wrote:
    i was writing simple editor (something like paint but more custom for
    my eventual needs) for big pixel (low resolution) drawing

    it showed in a minute i need a click for changing given drawed area of
    of one color into another color (becouse if no someone would need to
    do it by hand pixel by pixel and the need to change color of given
    element is very common)

    there is very simple method of doing it - i men i click in given color
    pixel then replace it by my color and call the same function on
    adjacent 4 pixels (only need check if it is in screen at all and if
    the color to change is that initial color

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x, y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
    }

    return 0;
    }

    it work but im not quite sure how to estimate the safety of this -

    On my machine, it's OK up to a 400x400 image (starting with all one
    colour and filling from the centre with another colour).

    At 500x500, I get stack overflow. The 400x400 the maximum recursion
    depth is 80,000 calls.


    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of this recursion becouse maybe it is more fitted to be run parrallel

    if yu would just 'fork' that one call on 4 parallel calls you dont get
    that problem - as it then works like 'horisontal' (shallow, like in
    shallow searh) not 'vertical' (in-depth, deep search)

    and if someone would rewrite in on non recursion way then it would be
    natural to rewrite it to work horisontal -w hich is better


    if someone would fork it in really parallel then the program of sybchronistation of ram accesses appears

    this observation hovewer may be seen as a strength of resursion -
    as it naturally shows it works good with micro-paralelisation
    (crowd of execution channels)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Peter 'Shaggy' Haywood on Wed Mar 20 01:26:34 2024
    Peter 'Shaggy' Haywood wrote:
    Groovy hepcat Lew Pitcher was jivin' in comp.lang.c on Mon, 18 Mar 2024
    01:27 am. It's a cool scene! Dig it.

    On 16/03/2024 04:11, fir wrote:

    [Snip.]

    int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
    unsigned new_color)
    {
    if(old_color == new_color) return 0;

    if(XYIsInScreen( x, y))
    if(GetPixelUnsafe(x,y)==old_color)
    {
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
    }

    return 0;
    }

    [Snippity doo dah.]

    Take fir's example code above; a simple single call to
    RecolorizePixelAndAdjacentOnes() will effectively recolour the
    origin cell multiple times, because of how the recursion is handled.

    No, I don't think so. You seem to have missed the fact that it checks
    the colour of the "current" pixel, and only continues (setting new
    colour & recursing) if it is the old colour.
    Of course, I'm infering (guessing) the functionality, at least
    partially (Unsafe? Safe?), of GetPixelUnsafe() and SetPixelSafe() based
    on their names.

    [Snip Lew's examples.]


    Safe and Unsafe means that Safe checks if the x,y is in the array of
    pixels, when Unsafe just writes without checking - i draw in array of
    unsigned 32 bit ARGB or GBRA (never remeber) pixels - then i blit that
    'bitmap' on window client size as it can be done in winapi

    here are exact code

    inline void SetPixelUnsafe(int x, int y, unsigned color)
    {
    extern int frame_size_x ;
    extern int frame_size_y ;
    extern unsigned int* frame_bitmap ;

    frame_bitmap[y*frame_size_x+x]=color;
    }


    inline void SetPixelSafe(int x, int y, unsigned color)
    {
    // if(frame==0) ERROR_EXIT("frame is zero in setpixelsafe ");
    if(x<0) return;
    if(x>frame_size_x-1) return;
    if(y<0) return;
    if(y>frame_size_y-1) return;

    frame_bitmap[y*frame_size_x+x]=color;
    }


    there was soem mistake in that function before as if i check already i
    should be using Unsafe versions of setpixel and getpixel but i tested
    this for work not for optimisation so i didnt care

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to fir on Wed Mar 20 01:36:29 2024
    fir wrote:

    this code of my was a most fast implementation when i was needed to test something in 3 minutes as the efect looks good

    i mean probabaly 30 minutes

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Malcolm McLean on Wed Mar 20 02:15:32 2024
    Malcolm McLean wrote:
    On 16/03/2024 15:09, Malcolm McLean wrote:
    On 16/03/2024 14:40, David Brown wrote:
    On 16/03/2024 12:33, Malcolm McLean wrote:

    And here's some code I wrote a while ago. Use that as a pattern. But
    not sure how well it works. Haven't used it for a long time.

    https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c



    Your implementation is a mess, /vastly/ more difficult to prove
    correct than the OP's original one, and unlikely to be very much
    faster (it will certainly scale in the same way in both time and
    memory usage).

    Now is this David Brown being David Borwn, ot its it actaully ture?

    And I need to run some tests, don't I?

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>

    int floodfill_r(unsigned char *grey, int width, int height, int x, int y, unsigned char target, unsigned char dest)
    {
    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;
    if (grey[y*width+x] != target)
    return 0;
    grey[y*width+x] = dest;
    floodfill_r(grey, width, height, x - 1, y, target, dest);
    floodfill_r(grey, width, height, x + 1, y, target, dest);
    floodfill_r(grey, width, height, x, y - 1, target, dest);
    floodfill_r(grey, width, height, x, y + 1, target, dest);

    return 0;
    }

    if someone would write simpler version i would write

    recolorize_pixel_chain(int x, int y)
    {
    if(map[y][x]==color_to_replace)
    {
    map[y][x]=replacement_color);

    recolorize_pixel_chain(x+1, y);
    recolorize_pixel_chain(x-1, y);
    recolorize_pixel_chain(x, y+1);
    recolorize_pixel_chain(x, y-1);
    }
    }

    but from practical coding the one with longer names is more practical
    imo - but this one above is more 'presenting'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Malcolm McLean on Tue Mar 19 21:19:19 2024
    Malcolm McLean <[email protected]> writes:

    On 19/03/2024 05:54, Tim Rentsch wrote:

    Malcolm McLean <[email protected]> writes:

    On 18/03/2024 09:30, Tim Rentsch wrote:

    Michael S <[email protected]> writes:

    [...]

    Except I don't understand why it works it all.
    Can't fill area have sub-areas that only connected through
    diagonal?

    It is customary in raster graphics to count pixels as adjacent
    only if they share an edge, not if they just share a corner.
    Usually that gives better results; the exceptions tend to need
    special handling anyway and not just connecting through
    diagonals.

    Though with a binary image, if the foreground is 4-connected, the
    background must therefore be 8-connected.

    It might be but it doesn't have to be.

    Also different terminology should be used, since 4-connected
    (also N-connected, for other integer N) has a specific meaning in
    graph theory, and one very different than what is meant above.

    That is the terminology in binary image processing. The pixels are 4-connected or 8-connected depending on whether a shared corner is
    considered to make the group of pixels two objects or one object.

    A poor choice of terminology. Side adjacent or corner and side
    adjacent would be better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Mar 19 21:43:33 2024
    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:

    No. Mine takes horizontal scan lines and extends them, then places
    the pixels above and below in a queue to be considered as seeds for
    the next scan line. (It's not mine, but I don't know who invented it.
    It wasn't me.)

    Tim, now what does it do? Essentially it's the recursive fill
    algorithm but with the data only on the stack instead of the call and
    the data. And todo is actually a queue rather than a stack.

    Now why would it be slower? Probaby because you usually only hit a
    pixel three times with mine - once below, once above, and once for
    the scan line itself, while you consider it 5 times for Tim's - once
    for each neighbour and once for itself. Then horizontally adjacent
    pixels are more likely to be in the same cache line than vertically
    adjacent pixels, so processing images in scan lines tends to be a bit
    faster.

    Below is a variant of recursive algorithm that is approximately as
    fast as your code (1.25x faster for filling solid rectangle, 1.43x
    slower for filling snake shape).
    The code is a bit long, but I hope that the logic is still obvious and
    there is no need to prove correctness. [...]

    To me it looks like this recursive algorithm doesn't find all
    pixels that need coloring in some situations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Mar 19 21:40:22 2024
    Michael S <[email protected]> writes:

    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.


    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color
    new ){ static const Point deltas[4] = { {1,0}, {0,1}, {-1,0},
    {0,-1}, }; UI k = 0;
    UI n = 17;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) )
    todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
    if( ! change_it( w, h, pixels, q, old, new ) )
    continue; todo[ k++ ] = q;
    }

    if( j-k < 3 ){
    Index new_n = n+n/4;
    Index new_j = new_n - (n-j);
    Point *t = realloc( todo, new_n * sizeof *t );
    if( !t ){ k = 0; break; }
    memmove( t + new_j, t + j, (n-j) * sizeof *t );
    todo = t, n = new_n, j = new_j;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color
    new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old )
    return 0; return pixels[p.x][p.y] = new, 1;
    }

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.

    Slower with some shapes, faster in others. In any case
    the code was written for clarity of presentation, with
    no attention paid to low-level performance.

    Is it the same algorithm?

    Sorry, the same algorithm as what? The same as Malcolm's?
    Definitely not. The same as my other posting that does
    not do dynamic reallocation? Yes in the sense that if the
    allocated array is large enough to begin with then no
    reallocations are needed.

    Besides, I don't think that use of VLA in library code is a good idea.
    VLA is optional in latest C standards. And incompatible with C++.

    The code uses a variably modified type, not a variable length
    array. Again, the choice is for clarity of presentation. If
    someone wants to get rid of the variably modified types, it's
    very easy to do, literally a five minute task. Anyway the
    interface is poorly designed to start with, there are bigger
    problems than just whether a variably modified type is used.
    (I chose the interface I did to approximate the interface
    used in Malcolm's code.)

    If someone wants to use the functionality from C++, it's
    easy enough to write a C wrapper function to do that.
    IMO C++ has diverged sufficiently from C so that there
    is little to be gained by trying to make code interoperable
    between the two languages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to fir on Wed Mar 20 10:41:55 2024
    On Wed, 20 Mar 2024 01:13:10 +0100
    fir <[email protected]> wrote:

    i asked the topic here as i felt i got no time to rethink if it will
    blow my progranm or not but that 30 minurtes task was for 30 minutes
    not for a multi hour discusion


    So you got the answer rather quickly and the answer is:
    "Yes, in the worst case it can consume a lot of stack. Don't use this
    simple and elegant algorithm unless you have full control both on size
    of the images and on size of the stack and on size of the stack frame
    generates by compiler for each recursive call."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to fir on Wed Mar 20 09:39:41 2024
    fir wrote:

    inline void RecolorizePixelAndSpawnNewPixelArea(int x, int y)
    {

    as i use word area in doble mining here it should be renamed like

    inline void RecolorizePixelAndSpawnNewPixelImmediateVicinity(int x, int y)

    (generally i use almost such log function names in my codes but not
    write comments at all
    than as everything is then self commenting imo..with variable names i
    use shorter, but sometimes it seem that some can also be a bit longel
    like here list_of_pixels_bot etc)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Wed Mar 20 09:29:54 2024
    On 19/03/2024 18:16, bart wrote:
    On 19/03/2024 15:05, fir wrote:

    if this is 100x 100 square and i put the initioation
    in middle it would go 50x right then at depth 50
    it would go one up than i guess 100 times left

    then just about this line up until up edge of picture
    - then it probably revert back (with a lot
    of false is) to first line and then go down

    That's what I thought until I tried it.

    If I start with an 18x18 image of all zeros, then fill starting from the centre with a 'colour' that is an incrementing value, then the final
    image displayed as a table of integers looks like this:


    171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
    136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
    135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
    100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
     99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84  83  82
     64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81
     63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47  46
     28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45
     27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10
    172 173 174 175 176 177 178 179 180   1   2   3   4   5   6   7   8   9
    209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
    208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
    217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
    252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
    253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
    288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
    289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
    324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

    By following the sequence starting from 1, you can see the fill-pattern.

    It's not clear how it gets from 171 at top left to 172 half-way down the
    left edge.


    After the sequence hits the end at 171, it backtracks down the numbers.
    27 is the first it reaches where there is a zero square neighbour, so it
    goes down from there - and the next number in the sequence is 172. Then
    it is free to move to the right again (then down after moving right is
    blocked at 180).

    I think your posts here gives a very nice and clear way to view the
    working of the algorithm. Thanks for doing that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 09:27:47 2024
    Michael S wrote:
    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:

    On 19/03/2024 11:18, Michael S wrote:
    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.


    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
    Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
    {-1,0}, {0,-1}, }; UI k = 0;
    UI n = 17;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) )
    todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y
    }; if( ! change_it( w, h, pixels, q, old, new ) )
    continue; todo[ k++ ] = q;
    }

    if( j-k < 3 ){
    Index new_n = n+n/4;
    Index new_j = new_n - (n-j);
    Point *t = realloc( todo, new_n * sizeof *t );
    if( !t ){ k = 0; break; }
    memmove( t + new_j, t + j, (n-j) * sizeof *t );
    todo = t, n = new_n, j = new_j;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
    Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
    old ) return 0; return pixels[p.x][p.y] = new, 1;
    }

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.
    Is it the same algorithm?


    No. Mine takes horizontal scan lines and extends them, then places
    the pixels above and below in a queue to be considered as seeds for
    the next scan line. (It's not mine, but I don't know who invented it.
    It wasn't me.)

    Tim, now what does it do? Essentially it's the recursive fill
    algorithm but with the data only on the stack instead of the call and
    the data. And todo is actually a queue rather than a stack.

    Now why would it be slower? Probaby because you usually only hit a
    pixel three times with mine - once below, once above, and once for
    the scan line itself, whilst you consider it 5 times for Tim's - once
    for each neighbour and once for itself. Then horizontally adjacent
    pixels are more likely to be in the same cache line than vertically
    adjacent pixels, so processing images in scan lines tends to be a bit
    faster.


    I did a little more investigation gradually modifying Tim's code for
    improved performance without changing the basic principle of the
    algorithm. Yes, micro-optimization. Yes, I said earlier that doing so
    in c.l.c it is bad sportsmanship. So what? I never claimed to be an
    ideal sportsman.
    The point is that after optimizations it's actually faster than the
    best implementations of original recursive algorithm, including implementation that uses explicit stack and is quite economical in its
    memory consumption. Tim's algorithm is 8 times less economical (8 bytes
    per saved node vs 1 byte in explicit stack) and nevertheless almost
    twice faster for both shapes that I was testing.
    So far, this algorithm is fastest among all "local" algorithms that I
    tried. By "local" I mean algorithms that don't try to recolor more than
    one pixel at time.
    "Non-local" algorithms i.e. yours and my recursive algorithm that
    recolors St. George cross, are somewhat faster, but I suspect that
    it's because all shapes that I use for testing have either long
    columns or long rows or both.
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost nothing
    else.
    The second nice thing is that it is easy to understand. Not as easy as original recursive method, but easier than the rest of them.

    If you or somebody else is interested, here is [micro]optimized variant:


    #include <stdlib.h>
    #include <stddef.h>
    #include <string.h>


    typedef unsigned char Color;
    typedef int UI;
    typedef struct { UI x, y; } Point;

    static inline
    Point* circularIncr(Point* p, Point* beg, Point* end) {
    return p + 1 == end ? beg : p + 1;
    }

    static inline
    Point mk_point(int x, int y) {
    Point pt={x,y};
    return pt;
    }

    int floodfill_r(
    Color *pixels,
    int w, int h,
    int pt0_x, int pt0_y,
    Color old, Color new)
    {
    if (pt0_x < 0 || pt0_x >= w || pt0_y < 0 || pt0_y >= h)
    return 0;

    if (pixels[pt0_y*w+pt0_x] != old)
    return 0;

    pixels[pt0_y*w+pt0_x] = new;

    const ptrdiff_t INITIAL_TODO_SIZE = 125;
    Point *todo = malloc( (INITIAL_TODO_SIZE+3) * sizeof *todo );
    // +3 is extra size to assist wrap-around of wr
    if (!todo)
    return -1;
    Point* todo_end = &todo[INITIAL_TODO_SIZE];

    todo[0] = mk_point(pt0_x, pt0_y);
    Point* wr = &todo[1];
    Point* rd = todo;
    ptrdiff_t free_space = INITIAL_TODO_SIZE - 1;
    do {
    Point pt = *rd;
    rd = circularIncr(rd, todo, todo_end);
    Point* prev_wr = wr;
    if (pt.x > 0 && pixels[pt.y*w+pt.x-1] == old) {
    pixels[pt.y*w+pt.x-1] = new;
    *wr++ = mk_point(pt.x-1, pt.y);
    }
    if (pt.y > 0 && pixels[pt.y*w+pt.x-w] == old) {
    pixels[pt.y*w+pt.x-w] = new;
    *wr++ = mk_point(pt.x, pt.y-1);
    }
    if (pt.x+1 < w && pixels[pt.y*w+pt.x+1] == old) {
    pixels[pt.y*w+pt.x+1] = new;
    *wr++ = mk_point(pt.x+1, pt.y);
    }
    if (pt.y+1 < h && pixels[pt.y*w+pt.x+w] == old) {
    pixels[pt.y*w+pt.x+w] = new;
    *wr++ = mk_point(pt.x, pt.y+1);
    }

    free_space += 1 - (wr - prev_wr);
    if (wr >= todo_end) {
    memcpy(todo, todo_end, (wr - todo_end)*sizeof(*wr));
    wr += todo - todo_end;
    }

    if (free_space < 4) {
    ptrdiff_t rdi = rd-todo;
    ptrdiff_t wri = wr-todo;
    ptrdiff_t sz = todo_end - todo;
    ptrdiff_t incr = sz/4;
    Point* new_todo = realloc(todo, (sz+incr+3) * sizeof *todo );
    // +3 is extra size to assist wrap-around of wr
    if(!new_todo) {
    free(todo);
    return -1;
    }
    free_space += incr;
    rd = &new_todo[rdi];
    wr = &new_todo[wri];
    todo = new_todo;
    todo_end = &todo[sz+incr];
    if (rd >= wr) {
    memmove(&rd[incr], rd, (sz-rdi) * sizeof *todo );
    rd = &rd[incr];
    }
    }
    } while (rd != wr);

    free( todo );
    return 1;
    }




    if i would write it non recursive it probably would be something like that

    [1] (if in static table based simpler way, generally
    in last years i prefer using reallock based resizable
    ones so i would need yet revrite)
    [2] not tested but it is draft of that code as i would attempt to write it (come like short names so would change "list_of_pixels"
    into "pixels" etc)




    const int list_of_pixels_max = 10*1000*1000;
    strauct {int x, y;} list_of_pixels[list_of_pixels_max];

    int list_of_pixels_top = 0;
    int list_of_pixels_bot = 0; //pointer to element to consume

    inline void AddPixelToList(int x, int y)
    {
    list_of_pixels[list_of_pixels_top].x = x;
    list_of_pixels[list_of_pixels_top].y = y;
    list_of_pixels_top++;

    // if(list_of_pixels_top>=list_of_pixels_max) ERROR_EXIT("overflow in
    list of pixels")

    }

    int color_to_replace = 0;
    int replacing_color = 0;

    inline void RecolorizePixelAndSpawnNewPixelArea(int x, int y)
    {
    if(!IsInFrame(x,y)) return;

    int color_here = GetPixelUnsafe(x,y);
    if(color_here==color_to_replace)
    {
    StePixelUnsafe(x,y, replacement_color);
    AddPixelToList( x+1, y);
    AddPixelToList( x-1, y);
    AddPixelToList( x, y+1);
    AddPixelToList( x, y-1);
    }
    }

    void RecolorizePixelArea(int x, int y, int color_to_replace_, int replacing_color_)
    {
    color_to_replace = color_to_replace_;
    replacing_color = replacing_color_;

    list_of_pixels_top = 0;
    list_of_pixels_bot = 0;

    RecolorizePixelAndSpawnNewPixelArea(x,y);

    while(list_of_pixels_bot<list_of_pixels_top)
    {

    RecolorizePixelAndSpawnNewPixelArea(list_of_pixels[list_of_pixels_bot].x,list_of_pixels[list_of_pixels_bot].y);
    list_of_pixels_bot++;
    }

    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Wed Mar 20 10:56:47 2024
    On Tue, 19 Mar 2024 21:43:33 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:

    No. Mine takes horizontal scan lines and extends them, then places
    the pixels above and below in a queue to be considered as seeds for
    the next scan line. (It's not mine, but I don't know who invented
    it. It wasn't me.)

    Tim, now what does it do? Essentially it's the recursive fill
    algorithm but with the data only on the stack instead of the call
    and the data. And todo is actually a queue rather than a stack.

    Now why would it be slower? Probaby because you usually only hit a
    pixel three times with mine - once below, once above, and once for
    the scan line itself, while you consider it 5 times for Tim's -
    once for each neighbour and once for itself. Then horizontally
    adjacent pixels are more likely to be in the same cache line than
    vertically adjacent pixels, so processing images in scan lines
    tends to be a bit faster.

    Below is a variant of recursive algorithm that is approximately as
    fast as your code (1.25x faster for filling solid rectangle, 1.43x
    slower for filling snake shape).
    The code is a bit long, but I hope that the logic is still obvious
    and there is no need to prove correctness. [...]

    To me it looks like this recursive algorithm doesn't find all
    pixels that need coloring in some situations.

    Yesterday night I had few doubts myself, but after further thinking
    came to conclusion that it it works in all situations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to fir on Wed Mar 20 09:51:02 2024
    fir wrote:

    RecolorizePixelAndSpawnNewPixelArea(x,y);

    while(list_of_pixels_bot<list_of_pixels_top)
    {

    RecolorizePixelAndSpawnNewPixelArea(list_of_pixels[list_of_pixels_bot].x,list_of_pixels[list_of_pixels_bot].y);

    list_of_pixels_bot++;
    }

    }



    maybe this is an example os case when do {} while() would work better
    than while,

    do { RecolorizePixelAndSpawnNewPixelImmediateVicinity(list_of_pixels[list_of_pixels_bot].x,list_of_pixels[list_of_pixels_bot].y);


    } while(list_of_pixels_bot<list_of_pixels_top)

    but that would be need to check as such loops are confusing

    if so i think it is somewhat general sheme

    do {something();} while(bot<top)

    of rewriting recursion probably

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to fir on Wed Mar 20 12:06:04 2024
    On Wed, 20 Mar 2024 00:30:56 +0100
    fir <[email protected]> wrote:

    Michael S wrote:
    On Wed, 20 Mar 2024 00:03:04 +0100
    fir <[email protected]> wrote:
    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }



    Not thred-safe.

    some thread safe as previous,

    The same as your previous.
    But I was modifying Malcolm's recursive variant rather than yours.
    Malcolm's was thread-safe.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Wed Mar 20 11:54:16 2024
    On Tue, 19 Mar 2024 21:40:22 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.


    typedef unsigned char Color;
    typedef unsigned int UI;
    typedef struct { UI x, y; } Point;
    typedef unsigned int Index;

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    void
    fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
    Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
    {-1,0}, {0,-1}, }; UI k = 0;
    UI n = 17;
    Point *todo = malloc( n * sizeof *todo );

    if( todo && change_it( w, h, pixels, p0, old, new ) )
    todo[k++] = p0;

    while( k > 0 ){
    Index j = n-k;
    memmove( todo + j, todo, k * sizeof *todo );
    k = 0;

    while( j < n ){
    Point p = todo[ j++ ];
    for( Index i = 0; i < 4; i++ ){
    Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
    if( ! change_it( w, h, pixels, q, old, new ) )
    continue; todo[ k++ ] = q;
    }

    if( j-k < 3 ){
    Index new_n = n+n/4;
    Index new_j = new_n - (n-j);
    Point *t = realloc( todo, new_n * sizeof *t );
    if( !t ){ k = 0; break; }
    memmove( t + new_j, t + j, (n-j) * sizeof *t );
    todo = t, n = new_n, j = new_j;
    }
    }
    }

    free( todo );
    }

    _Bool
    change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
    Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
    old ) return 0; return pixels[p.x][p.y] = new, 1;
    }

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.

    Slower with some shapes, faster in others.

    In my small test suit I found no cases where this specific code is
    measurably faster than code of Malcolm.
    I did find one case in which they are approximately equal. I call it
    "slalom shape" and it's more or less designed to be the worst case for algorithms that are trying to speed themselves by take advantage of
    straight lines.
    The slalom shape is generated by following code:

    static
    void make_slalom(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    const int n_col = width/3;
    const int n_row = (height-3)/4;

    // top row
    // P B B P P P
    for (int col = 0; col < n_col; ++col) {
    unsigned char c = (col & 1)==0 ? background_c : pen_c;
    image[col*3] = pen_c; image[col*3+1] = c; image[col*3+2] = c;
    }

    // main image: consists of 3x4 blocks filled by following pattern
    // P B B
    // P P B
    // B P B
    // P P B
    for (int row = 0; row < n_row; ++row) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[(row*4+1)*width+col*3];
    p[0] = pen_c; p[1] = background_c; p[2] = background_c;
    p += width;
    p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
    p += width;
    p[0] = background_c; p[1] = pen_c; p[2] = background_c;
    p += width;
    p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
    }
    }

    // near-bottom rows
    // P B B
    for (int y = n_row*4+1; y < height-1; ++y) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[y*width+col*3];
    p[0] = pen_c; p[1] = background_c; p[2] = background_c;
    }
    }

    // bottom row - all P
    memset(&image[(height-1)*width], pen_c, width);

    // rightmost columns
    for (int x = n_col*3; x < width; ++x) {
    for (int y = 0; y < height-1; ++y)
    image[y*width+x] = background_c;
    }
    }

    In any case
    the code was written for clarity of presentation, with
    no attention paid to low-level performance.


    Yes, your code is easy to understand. Could have been easier still if persistent indices had more meaningful names.
    In other post I showed optimized variant of your algorithm:
    - 4-neighbors loop unrolled. Majority of the speed up come not from
    unrolling itself, but from specialization of in-rectangle check
    enabled by unroll.
    - Todo queue implemented as circular buffer.
    - Initial size of queue increased.
    This optimized variant is more competitive with 'line-grabby'
    algorithms in filling solid shapes and faster than them in 'slalom'
    case.

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Is it the same algorithm?

    Sorry, the same algorithm as what? The same as Malcolm's?

    Yes, that what I meant.
    Still didn't find guts to try to understand what Malcolm's code is
    doing.

    Definitely not. The same as my other posting that does
    not do dynamic reallocation? Yes in the sense that if the
    allocated array is large enough to begin with then no
    reallocations are needed.



    Besides, I don't think that use of VLA in library code is a good
    idea. VLA is optional in latest C standards. And incompatible with
    C++.

    The code uses a variably modified type, not a variable length
    array.

    I am not sufficiently versed in C Standard terminology to see a
    difference.
    Aren't they both introduced in C99 and made optional in later standards?

    Again, the choice is for clarity of presentation. If
    someone wants to get rid of the variably modified types, it's
    very easy to do, literally a five minute task.

    Yes, that's what it took for me.
    But I knew that variably modified types exist, even if I didn't know
    that they are called such.
    OTOH, many (majority?) of C programmers never heard about them.

    Anyway the
    interface is poorly designed to start with, there are bigger
    problems than just whether a variably modified type is used.
    (I chose the interface I did to approximate the interface
    used in Malcolm's code.)


    That's true.
    The biggest problem of Malcolm's interface is that logical width of the
    image is the same as physical width (a.k.a. line pitch, in LAPACK
    it is called the first dimension). These parameters should be separate.

    If someone wants to use the functionality from C++, it's
    easy enough to write a C wrapper function to do that.
    IMO C++ has diverged sufficiently from C so that there
    is little to be gained by trying to make code interoperable
    between the two languages.

    From the practical perspective, the biggest obstacle is that your code
    can't be compiled with popular Microsoft compilers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Michael S on Wed Mar 20 10:23:45 2024
    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 21:40:22 -0700
    Tim Rentsch <[email protected]> wrote:
    Michael S <[email protected]> writes:
    ...
    Tim Rentsch <[email protected]> writes:
    ...
    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    Besides, I don't think that use of VLA in library code is a good
    idea. VLA is optional in latest C standards. And incompatible with
    C++.

    The code uses a variably modified type, not a variable length
    array.

    I am not sufficiently versed in C Standard terminology to see a
    difference.

    A VLA is a declared object -- an array with a size that is not a
    compile-time constant. A variably modified type is just a type, not an
    object. Obviously one can use such a type to declare a VLA, but when it
    is the type of a function parameter, there need be no declared object
    with that type. Usually the associated function argument will have been dynamically allocated.

    Aren't they both introduced in C99 and made optional in later
    standards?

    I think so but that's a shame since VMTs are very helpful for writing
    array code. They avoid the need to keep calculating the index with multiplications.

    Making both optional was a classic case of throwing the baby out with
    the bath water. Few of the objections raised about VLAs apply to VMTs.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 13:44:04 2024
    Michael S wrote:
    On Wed, 20 Mar 2024 00:30:56 +0100
    fir <[email protected]> wrote:

    Michael S wrote:
    On Wed, 20 Mar 2024 00:03:04 +0100
    fir <[email protected]> wrote:
    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }



    Not thred-safe.

    some thread safe as previous,

    The same as your previous.
    But I was modifying Malcolm's recursive variant rather than yours.
    Malcolm's was thread-safe.



    sure, i dont always read into peoples code.. i just wanted to say it
    seems you pass this context structure or pointer to it down the stack
    each call - it is not necessary

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 06:51:20 2024
    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 21:43:33 -0700
    Tim Rentsch <[email protected]> wrote:
    [...]

    To me it looks like this recursive algorithm doesn't find all
    pixels that need coloring in some situations.

    Yesterday night I had few doubts myself, but after further thinking
    came to conclusion that it it works in all situations.

    Sorry, my bad. I did some experiments to convince myself
    the algorithm sometimes doesn't work, but it turns out the
    results showed a problem in the experiments rather than
    the algorithm. :/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to fir on Wed Mar 20 15:44:38 2024
    On Wed, 20 Mar 2024 13:44:04 +0100
    fir <[email protected]> wrote:

    Michael S wrote:
    On Wed, 20 Mar 2024 00:30:56 +0100
    fir <[email protected]> wrote:

    Michael S wrote:
    On Wed, 20 Mar 2024 00:03:04 +0100
    fir <[email protected]> wrote:
    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }



    Not thred-safe.

    some thread safe as previous,

    The same as your previous.
    But I was modifying Malcolm's recursive variant rather than yours. Malcolm's was thread-safe.



    sure, i dont always read into peoples code.. i just wanted to say it
    seems you pass this context structure or pointer to it

    Yes, pointer. That's the whole point of my modification of
    Malcolm's code - to copy one pointer instead of 5 variables that
    are never changed.

    down the stack
    each call - it is not necessary

    Not necessary if you don't want it thread-safe. Necessary otherwise.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 15:20:37 2024
    Michael S wrote:
    On Wed, 20 Mar 2024 01:13:10 +0100
    fir <[email protected]> wrote:

    i asked the topic here as i felt i got no time to rethink if it will
    blow my progranm or not but that 30 minurtes task was for 30 minutes
    not for a multi hour discusion


    So you got the answer rather quickly and the answer is:
    "Yes, in the worst case it can consume a lot of stack. Don't use this
    simple and elegant algorithm unless you have full control both on size
    of the images and on size of the stack and on size of the stack frame generates by compiler for each recursive call."


    ye, may conclusion would here be rather

    put stack to 100 or even 150 MB and forget... then worry if the code
    (of recolorisation) work too slow

    (i know howewer this is potential bug af is someone would want then
    recolorise of very big area there still would be stack overflow, but
    this is unlikely)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Michael S on Wed Mar 20 15:17:32 2024
    Michael S wrote:
    On Wed, 20 Mar 2024 13:44:04 +0100
    fir <[email protected]> wrote:

    Michael S wrote:
    On Wed, 20 Mar 2024 00:30:56 +0100
    fir <[email protected]> wrote:

    Michael S wrote:
    On Wed, 20 Mar 2024 00:03:04 +0100
    fir <[email protected]> wrote:
    im not quite sure what you do here.. pass the structure? in fact
    the thing you name context you may not pass at all just make is
    standalone static variables becouse they/it is the same for whole
    "branch" (given recursive branch of recolorisation)

    something like

    int old_color = 0xff0000;
    int new_color = 0x00ff00;

    void RecolorizePixelAndAdjacentPixels(int x, int y)
    {
    //...
    }



    Not thred-safe.

    some thread safe as previous,

    The same as your previous.
    But I was modifying Malcolm's recursive variant rather than yours.
    Malcolm's was thread-safe.



    sure, i dont always read into peoples code.. i just wanted to say it
    seems you pass this context structure or pointer to it

    Yes, pointer. That's the whole point of my modification of
    Malcolm's code - to copy one pointer instead of 5 variables that
    are never changed.

    down the stack
    each call - it is not necessary

    Not necessary if you don't want it thread-safe. Necessary otherwise.



    okay, if you say so, i dont use threads as i dont like them so i dont
    know (and dont want to know) ;c

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 07:27:38 2024
    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:

    On 19/03/2024 11:18, Michael S wrote:

    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.

    [...]

    I did a little more investigation gradually modifying Tim's code
    for improved performance without changing the basic principle of
    the algorithm. [...]

    I appreciate your doing this. I developed independently a
    couple of versions along similar lines.

    So far, this algorithm is fastest among all "local" algorithms
    that I tried. By "local" I mean algorithms that don't try to
    recolor more than one pixel at time.

    "Non-local" algorithms i.e. yours and my recursive algorithm that
    recolors St. George cross, are somewhat faster, [...].

    I was confused by this statement at first but now I see that
    "yours" refers to Malcolm's algorithm.

    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    The second nice thing is that it is easy to understand. Not as
    easy as original recursive method, but easier than the rest of
    them.

    If you or somebody else is interested, here is [micro]optimized
    variant: [...]

    Good show. I will try to get my latest version posted soon.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Ben Bacarisse on Wed Mar 20 07:52:12 2024
    Ben Bacarisse <[email protected]> writes:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 21:40:22 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    ...

    Tim Rentsch <[email protected]> writes:

    ...

    static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
    Color );

    Besides, I don't think that use of VLA in library code is a
    good idea. VLA is optional in latest C standards. And
    incompatible with C++.

    The code uses a variably modified type, not a variable length
    array.

    I am not sufficiently versed in C Standard terminology to see a
    difference.

    A VLA is a declared object -- an array with a size that is not a
    compile-time constant. A variably modified type is just a type,
    not an object. Obviously one can use such a type to declare a
    VLA, but when it is the type of a function parameter, there need
    be no declared object with that type. Usually the associated
    function argument will have been dynamically allocated.

    Also ordinary local variables can be declared to have a variably
    modified type (the type not necessarily having been introduced
    separately), often a benefit for code that is dealing with
    multi-dimensional arrays.

    Aren't they both introduced in C99 and made optional in later
    standards?

    I think so but that's a shame since VMTs are very helpful for
    writing array code. They avoid the need to keep calculating the
    index with multiplications.

    C11 added a pre-defined preprocessor macro __STDC_NO_VLA__, which implementations can define to be 1 "intended to indicate that the implementation does not support variable length arrays or variably
    modified types." It's amusing to note that an implementation can
    support VLAs and VMTs but still define the macro if they are not
    intended to be supported. ;)

    Making both optional was a classic case of throwing the baby out
    with the bath water. Few of the objections raised about VLAs
    apply to VMTs.

    Agree 100%.

    Someone who wants to take a stand on this issue might to consider
    adding the following lines

    #if __STDC_NO_VLA__
    #error Substandard implementation detected
    #endif

    at various places around their source code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 10:01:10 2024
    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 21:40:22 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.
    [code]

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.

    Slower with some shapes, faster in others.

    In my small test suit I found no cases where this specific code is
    measurably faster than code of Malcolm.

    My test cases include pixel fields of 32k by 32k, with for
    example filling the entire field starting at the center point.
    Kind of a stress test but it turned up some interesting results.

    I did find one case in which they are approximately equal. I call
    it "slalom shape" and it's more or less designed to be the worst
    case for algorithms that are trying to speed themselves by take
    advantage of straight lines.
    The slalom shape is generated by following code:
    [code]

    Thanks, I may try that.

    In any case
    the code was written for clarity of presentation, with
    no attention paid to low-level performance.

    Yes, your code is easy to understand. Could have been easier
    still if persistent indices had more meaningful names.

    I have a different view on that question. However I take your
    point.

    In other post I showed optimized variant of your algorithm: -
    4-neighbors loop unrolled. Majority of the speed up come not from
    unrolling itself, but from specialization of in-rectangle check
    enabled by unroll.
    - Todo queue implemented as circular buffer.
    - Initial size of queue increased.
    This optimized variant is more competitive with 'line-grabby'
    algorithms in filling solid shapes and faster than them in
    'slalom' case.

    Yes, unrolling is an obvious improvement. I deliberately chose a
    simple (and non-optimized) method to make it easier to see how it
    works. Simple optimizations are left as an exercise for the
    reader. :)

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    Besides, I don't think that use of VLA in library code is a good
    idea. VLA is optional in latest C standards. And incompatible
    with C++.

    The code uses a variably modified type, not a variable length
    array.

    I am not sufficiently versed in C Standard terminology to see a
    difference.
    Aren't they both introduced in C99 and made optional in later
    standards?

    Ben explained the difference. I posted a short followup to his
    explanation. And yes, as of C11 VLAs and VMTs are both optional
    (it would be nice if a new C standard put back the requirement
    of variably modified types).

    Again, the choice is for clarity of presentation. If
    someone wants to get rid of the variably modified types, it's
    very easy to do, literally a five minute task.

    Yes, that's what it took for me.
    But I knew that variably modified types exist, even if I didn't know
    that they are called such.
    OTOH, many (majority?) of C programmers never heard about them.

    Something that surprised me is that some C programmers don't
    know what compound literals are, even though they have been
    around more than 20 years. I'm not inclined to try to cater
    to people who program in C but aren't at least aware of what
    was done more than 20 years ago.

    Anyway the interface is poorly designed to start with, [...]

    That's true. [...]

    Yes! Hoo rah!

    If someone wants to use the functionality from C++, it's
    easy enough to write a C wrapper function to do that.
    IMO C++ has diverged sufficiently from C so that there
    is little to be gained by trying to make code interoperable
    between the two languages.

    From the practical perspective, the biggest obstacle is that your
    code can't be compiled with popular Microsoft compilers.

    Some people might consider that a plus rather than a minus. ;)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 10:26:58 2024
    Michael S <[email protected]> writes:

    [...]

    I did a little more investigation gradually modifying Tim's code
    for improved performance without changing the basic principle of
    the algorithm. [...]

    Here is a rendition of my latest and fastest refinement.

    #include <stdlib.h>

    typedef unsigned char UC;
    typedef unsigned UI;
    typedef unsigned U32;
    typedef unsigned long long U64;
    typedef struct { UI x, y; } Point;

    void
    faster_fill( UI w0, UI h0, UC pixels[], Point p0, UC old, UC new ){
    U64 const w = w0;
    U64 const h = h0;
    U64 const xm = w-1;
    U64 const ym = h-1;

    U64 j = 0;
    U64 k = 0;
    U64 n = 1u << 10;
    U64 m = n-1;
    U32 *todo = malloc( n * sizeof *todo );
    U64 x = p0.x;
    U64 y = p0.y;

    if( !todo || x >= w || y >= h || pixels[ x*h+y ] != old ) return;

    todo[ k++ ] = x<<16 | y;

    while( j != k ){
    U64 used = j < k ? k-j : k+n-j;
    U64 open = n - used;
    if( open < used / 16 ){
    U64 new_n = n*2;
    U64 new_m = new_n-1;
    U64 new_j = j < k ? j : j+n;
    U32 *t = realloc( todo, new_n * sizeof *t );
    if( ! t ) break;
    if( j != new_j ) memcpy( t+new_j, t+j, (n-j) * sizeof *t );
    todo = t, n = new_n, m = new_m, j = new_j, open = n-used;
    }

    U64 const jx = used <= 3*open ? k : j+open/3 &m;
    while( j != jx ){
    UI p = todo[j]; j = j+1 &m;
    x = p >> 16, y = p & 0xFFFF;
    if( x > 0 && pixels[ x*h-h + y ] == old ){
    todo[k] = x-1<<16 | y, k = k+1&m, pixels[ x*h-h +y ] = new;
    }
    if( y > 0 && pixels[ x*h + y-1 ] == old ){
    todo[k] = x<<16 | y-1, k = k+1&m, pixels[ x*h +y-1 ] = new;
    }
    if( x < xm && pixels[ x*h+h + y ] == old ){
    todo[k] = x+1<<16 | y, k = k+1&m, pixels[ x*h+h +y ] = new;
    }
    if( y < ym && pixels[ x*h + y+1 ] == old ){
    todo[k] = x<<16 | y+1, k = k+1&m, pixels[ x*h +y+1 ] = new;
    }
    }
    }

    free( todo );
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Thu Mar 21 15:36:45 2024
    On Wed, 20 Mar 2024 10:26:58 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    I did a little more investigation gradually modifying Tim's code
    for improved performance without changing the basic principle of
    the algorithm. [...]

    Here is a rendition of my latest and fastest refinement.


    WOW, you really opened up your bag of tricks!
    Power-of-two sized circular buffers is something that I tend to use on
    smaller systems, like DSPs or MCUs rather than on "big" computers. But,
    of course, on "big" computers it also helps.
    Packing {x,y} into 32-bit word is a bit dirty. I'd guess that we can
    justify it by claiming that original code although has similar
    limitation of width*height <= INT_MAX.
    Removal of FIFO empty and almost-full tests in the inner loop helps
    solid shapes, but appears to slow down "drawn" shapes. Since solid
    shapes are the slowest to fill, it is probably a good trade-off.

    Overall, it is faster than my implementation of your algorithm. Esp. so
    for solid shapes. Esp. of esp. so on Intel Skylake CPUs where speed up
    is up to 1.75x.

    More complicated 'St. George Cross' algorithms are still faster for
    solid shapes and for shapes dominated by long horizontal or long
    vertical lines. But they are ... well ... more complicated.
    And [on Skylake] their worst case ('slalom' shape) is somewhat slower in absolute sense than the worst case of your code (a solid bar).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Mar 21 09:47:15 2024
    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 10:26:58 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    I did a little more investigation gradually modifying Tim's code
    for improved performance without changing the basic principle of
    the algorithm. [...]

    Here is a rendition of my latest and fastest refinement.

    WOW, you really opened up your bag of tricks!

    That I did, that I did. :)

    I can do this kind of stuff when I need to. Usually I don't need
    to.

    Power-of-two sized circular buffers is something that I tend to
    use on smaller systems, like DSPs or MCUs rather than on "big"
    computers. But, of course, on "big" computers it also helps.

    Bitwise '&' is simply faster than '%'. Also, bitwise '&' works
    on unsigned types in the event that there is wraparound, but '%'
    probably doesn't.

    Packing {x,y} into 32-bit word is a bit dirty. I'd guess that we
    can justify it by claiming that original code although has similar
    limitation of width*height <= INT_MAX.

    Yes, it is a bit dirty. In practice pixel fields almost never get
    above 16 bits in each direction, and the code is easy enough to
    change (by putting two 32-bit quantities into a 64-bit type) if
    it becomes necessary to accommodate an enormous pixel field.

    Removal of FIFO empty and almost-full tests in the inner loop helps
    solid shapes, but appears to slow down "drawn" shapes. Since solid
    shapes are the slowest to fill, it is probably a good trade-off.

    Taking those tests out of the inner loop helps when there is big
    frontier set, because the tests don't have to be done as often.
    When the frontier set is small, as we would expect for long
    skinny shapes, doing that doesn't help as much (and of course
    other overhead may make it worse in such cases).

    Overall, it is faster than my implementation of your algorithm.
    Esp. so for solid shapes. Esp. of esp. so on Intel Skylake CPUs
    where speed up is up to 1.75x.

    More complicated 'St. George Cross' algorithms are still faster
    for solid shapes and for shapes dominated by long horizontal or
    long vertical lines. But they are ... well ... more complicated.
    And [on Skylake] their worst case ('slalom' shape) is somewhat
    slower in absolute sense than the worst case of your code (a solid
    bar).

    I played around with a "non-local" (in your terminology) version
    of my most recently posted code, and discovered some things.
    First the non-local version is somewhat faster on some shapes,
    but noticeably slower on others. The non-local version is more
    sensitive to which starting point is chosen. In a way it looks
    similar to what happens with compression algorithms - some cases
    get better, others get decidedly worse. I didn't do a lot of
    experiments in an effort to determine what the range or relative
    proportions of the different behaviors are.

    After thinking about it a bit, it seems to me that a local-only
    method is "more queue-like" and a non-local method is "more
    stack-like". Using a pure queue plods along very predictably,
    never getting much better or much worse. Being more stack-like
    sometimes gets a speedup, but sometimes stumbles into the pit of
    despair, which in the worse case needs a lot of memory and does
    more memory shuffling. So for a general algorithm I'd opt for a
    local-only method. Later on it might be good to use a more
    tailored algorithm for special cases, if we can identify which
    cases are special in a way that isn't too expensive.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter 'Shaggy' Haywood@21:1/5 to All on Fri Mar 22 13:04:39 2024
    Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
    am. It's a cool scene! Dig it.

    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of this recursion becouse maybe it is more fitted to be run parrallel

    I wasn't going to post this here, since it's really an algorithm
    issue, rather than a C issue. But the thread seems to have gone on for
    some time with you seeming to be unable to solve this. So I'll give you
    this as a clue.
    The (or, at least, a) solution is only partially recursive. What I
    have used is a line-based algorithm, each line being filled iteratively
    (in a simple loop) from left to right. Recursion from line to line
    completes the algorithm. Thus, the recursion level is greatly reduced.
    And you should find that this approach fills an area of any shape.
    Note, however, that for some pathological cases (very large and
    complex shapes), this can still create a fairly large level of
    recursion. Maybe a more complex approach can deal with this. What I
    present here is just a very simple one which, in most cases, should
    have a level of recursion well within reason.
    I use a two part approach. The first part (called floodfill in the
    code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
    only called by floodfill(). It goes something like this (although this
    is incomplete, untested and not code I've actually used, just an
    example):

    static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
    {
    unsigned start, end;

    /* Find start and end of line within floodfill area. */
    start = end = x;
    while(old_clr == get_pixel(y, start - 1))
    --start;
    while(old_clr == get_pixel(y, end + 1))
    ++end;

    /* Fill line with new colour. */
    for(x = start; x <= end; x++)
    set_pixel(y, x, new_clr);

    /* Run along again, checking pixel colours above and below,
    and recursing if appropriate. */
    for(x = start; x <= end; x++)
    {
    if(old_clr == get_pixel(y - 1, x))
    r_floodfill(y - 1, x, new_clr, old_clr);
    if(old_clr == get_pixel(y + 1, x))
    r_floodfill(y + 1, x, new_clr, old_clr);
    }
    }

    void floodfill(unsigned y, unsigned x, pixel_t new_clr)
    {
    pixel_t old_clr = get_pixel(y, x);

    /* Only proceed if colours differ. */
    if(new_clr != old_clr)
    r_floodfill(y, x, new_clr, old_clr);
    }

    To use this, simply call floodfill() passing the coordinates of the
    starting point for the fill (y and x) and the fill colour (new_clr).

    --


    ----- Dig the NEW and IMPROVED news sig!! -----


    -------------- Shaggy was here! ---------------
    Ain't I'm a dawg!!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Peter 'Shaggy' Haywood on Fri Mar 22 17:55:26 2024
    On Fri, 22 Mar 2024 13:04:39 +1100
    Peter 'Shaggy' Haywood <[email protected]> wrote:

    Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
    am. It's a cool scene! Dig it.

    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of
    this recursion becouse maybe it is more fitted to be run parrallel

    I wasn't going to post this here, since it's really an algorithm
    issue, rather than a C issue. But the thread seems to have gone on for
    some time with you seeming to be unable to solve this. So I'll give
    you this as a clue.
    The (or, at least, a) solution is only partially recursive. What I
    have used is a line-based algorithm, each line being filled
    iteratively (in a simple loop) from left to right. Recursion from
    line to line completes the algorithm. Thus, the recursion level is
    greatly reduced. And you should find that this approach fills an area
    of any shape. Note, however, that for some pathological cases (very
    large and complex shapes), this can still create a fairly large level
    of recursion. Maybe a more complex approach can deal with this. What I present here is just a very simple one which, in most cases, should
    have a level of recursion well within reason.
    I use a two part approach. The first part (called floodfill in the
    code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but
    is only called by floodfill(). It goes something like this (although
    this is incomplete, untested and not code I've actually used, just an example):

    static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr,
    pixel_t old_clr)
    {
    unsigned start, end;

    /* Find start and end of line within floodfill area. */
    start = end = x;
    while(old_clr == get_pixel(y, start - 1))
    --start;
    while(old_clr == get_pixel(y, end + 1))
    ++end;

    /* Fill line with new colour. */
    for(x = start; x <= end; x++)
    set_pixel(y, x, new_clr);

    /* Run along again, checking pixel colours above and below,
    and recursing if appropriate. */
    for(x = start; x <= end; x++)
    {
    if(old_clr == get_pixel(y - 1, x))
    r_floodfill(y - 1, x, new_clr, old_clr);
    if(old_clr == get_pixel(y + 1, x))
    r_floodfill(y + 1, x, new_clr, old_clr);
    }
    }

    void floodfill(unsigned y, unsigned x, pixel_t new_clr)
    {
    pixel_t old_clr = get_pixel(y, x);

    /* Only proceed if colours differ. */
    if(new_clr != old_clr)
    r_floodfill(y, x, new_clr, old_clr);
    }

    To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).


    It looks like anisotropic variant of my St. George Cross algorithm.
    Or like recursive variant of Malcolm's algorithm.
    Being anisotropic, it has higher amount of glass jaws. In particular,
    it would be very slow for not uncommon 'jail window' patterns.
    * *** *** *** ***
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    *** *** *** *** *

    Also, implementation is still recursive and the worst-case recursion
    depth is still O(N), where N is total number of recolored pixels, so
    unlike many other solutions presented here, you didn't solve fir's
    original problem.
    And in presented form it's not thread-safe. Which is not a problem for
    fir, but nonn-desirable for the rest of us.

    Conclusion: sorry, you aren't going to get a cookie for your effort.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Fri Mar 22 18:31:16 2024
    On Fri, 22 Mar 2024 17:55:26 +0300
    Michael S <[email protected]> wrote:

    On Fri, 22 Mar 2024 13:04:39 +1100
    Peter 'Shaggy' Haywood <[email protected]> wrote:


    To use this, simply call floodfill() passing the coordinates of
    the starting point for the fill (y and x) and the fill colour
    (new_clr).

    It looks like anisotropic variant of my St. George Cross algorithm.
    Or like recursive variant of Malcolm's algorithm.
    Being anisotropic, it has higher amount of glass jaws. In particular,
    it would be very slow for not uncommon 'jail window' patterns.
    * *** *** *** ***
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    *** *** *** *** *

    Also, implementation is still recursive and the worst-case recursion
    depth is still O(N), where N is total number of recolored pixels, so
    unlike many other solutions presented here, you didn't solve fir's
    original problem.
    And in presented form it's not thread-safe. Which is not a problem for
    fir, but nonn-desirable for the rest of us.

    Conclusion: sorry, you aren't going to get a cookie for your effort.



    So, what is my own practical answer?
    Assuming that speed is not a top priority and that simplicity
    is pretty high on priority scale and that it should work with big
    images and default stack size under Windows, I will go with following
    not particularly fast and not particularly slow algorithm that I call
    "deferred stack". That is, it's mostly explicit stack, but (explicit) recursion is deferred until all four neighbors of current pixel saved
    on todo stack.
    "Not particularly slow" means that I did see cases where some other
    algorithms is 2 times faster, but had never seen 3x difference.
    In case x and y are known to fit in uint16_t, UI type could be redefined accordingly. It will make execution faster, but not by much.

    #include <stdlib.h>
    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Color;
    typedef int UI;

    int floodfill_r(
    Color *image,
    int width,
    int height,
    int x0,
    int y0,
    Color old,
    Color new)
    {
    if (width < 0 || height < 0)
    return 0;

    if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
    return 0;

    size_t x = x0;
    size_t y = y0;
    if (image[y*width+x] != old)
    return 0;

    const ptrdiff_t INITIAL_TODO_SIZE = 128;
    struct Point { UI x, y; } ;
    struct Point *todo = malloc(INITIAL_TODO_SIZE * sizeof *todo );
    if (!todo)
    return -1;
    struct Point* todo_end = &todo[INITIAL_TODO_SIZE];

    todo[0].x = x; todo[0].y = y;
    struct Point* sp = &todo[1];
    do {
    x = sp[-1].x; y = sp[-1].y;
    --sp;
    if (image[y*width+x] == old) {
    image[y*width+x] = new;
    if (x > 0 && image[y*width+x-1] == old) {
    sp->x = x - 1; sp->y = y; ++sp;
    }
    if (y > 0 && image[y*width+x-width] == old) {
    sp->x = x; sp->y = y - 1; ++sp;
    }
    if (x+1 < width && image[y*width+x+1] == old) {
    sp->x = x + 1; sp->y = y; ++sp;
    }
    if (y+1 < height && image[y*width+x+width] == old) {
    sp->x = x; sp->y = y + 1; ++sp;
    }

    if (todo_end-sp < 4) {
    ptrdiff_t used = sp-todo;
    ptrdiff_t size = todo_end - todo;
    size += size/4;
    struct Point* new_todo = realloc(todo, size * sizeof *todo );
    if(!new_todo) {
    free(todo);
    return -1;
    }
    todo = new_todo;
    sp = &todo[used];
    todo_end = &todo[size];
    }
    }
    } while (sp != todo);

    free( todo );
    return 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Sat Mar 23 00:21:00 2024
    Malcolm McLean <[email protected]> writes:

    On 17/03/2024 11:25, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    Recursion make programs harder to reason about and prove correct.
    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    Example given. A recursive algorithm which is hard to reason about and
    prove correct, because we don't really know whether under perfectly
    reasonable assumptions it will or will not blow the stack.
    Had you offered a proof that your code neither "blows the stack" nor
    runs out of any other resource we'd have a starting point for
    comparison, but you have not done that.
    Mind you, had you done that, we would have something that might
    eventually become only one piece of evidence for what is an
    astonishingly general remark. Broadly applicable remarks require either
    broadly applicable evidence or a wealth of distinct cases.
    Your "rule" suggests that all reasoning is impeded by the presence of
    recursion and I don't think you can support that claim. This is
    characteristic of many of your remarks -- they are general "rules" that
    often remain rules even when there is evidence to the contrary.
    I'll make another point in the hope of clarifying the matter. An
    algorithm or code is usually proved correct (or not!) under the
    assumption that it has adequate resources -- usually time and storage.
    Further reasoning may then be done to determine the resource
    requirements since this is so often dependent on context. This
    separation is helpful as you don't usually want to tie "correctness" to
    some specific installation. The code might run on a system with a
    dynamically allocated stack, for example, that has very similar
    limitations to "heap" memory.
    To put is more generally, we often want to prove properties of code that
    are independent of physical constraints. Your remark includes this kind
    reasoning. Did you intend it to?

    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited whilst stacks are not.

    I put off answering for enough time that I now don't care anymore. I
    think that's a win for everyone.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From fir@21:1/5 to Peter 'Shaggy' Haywood on Sat Mar 23 11:06:42 2024
    Peter 'Shaggy' Haywood wrote:
    Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
    am. It's a cool scene! Dig it.

    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of this
    recursion becouse maybe it is more fitted to be run parrallel

    I wasn't going to post this here, since it's really an algorithm
    issue, rather than a C issue. But the thread seems to have gone on for
    some time with you seeming to be unable to solve this. So I'll give you
    this as a clue.
    The (or, at least, a) solution is only partially recursive. What I
    have used is a line-based algorithm, each line being filled iteratively
    (in a simple loop) from left to right. Recursion from line to line
    completes the algorithm. Thus, the recursion level is greatly reduced.
    And you should find that this approach fills an area of any shape.
    Note, however, that for some pathological cases (very large and
    complex shapes), this can still create a fairly large level of
    recursion. Maybe a more complex approach can deal with this. What I
    present here is just a very simple one which, in most cases, should
    have a level of recursion well within reason.
    I use a two part approach. The first part (called floodfill in the
    code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
    only called by floodfill(). It goes something like this (although this
    is incomplete, untested and not code I've actually used, just an
    example):

    static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
    {
    unsigned start, end;

    /* Find start and end of line within floodfill area. */
    start = end = x;
    while(old_clr == get_pixel(y, start - 1))
    --start;
    while(old_clr == get_pixel(y, end + 1))
    ++end;

    /* Fill line with new colour. */
    for(x = start; x <= end; x++)
    set_pixel(y, x, new_clr);

    /* Run along again, checking pixel colours above and below,
    and recursing if appropriate. */
    for(x = start; x <= end; x++)
    {
    if(old_clr == get_pixel(y - 1, x))
    r_floodfill(y - 1, x, new_clr, old_clr);
    if(old_clr == get_pixel(y + 1, x))
    r_floodfill(y + 1, x, new_clr, old_clr);
    }
    }

    void floodfill(unsigned y, unsigned x, pixel_t new_clr)
    {
    pixel_t old_clr = get_pixel(y, x);

    /* Only proceed if colours differ. */
    if(new_clr != old_clr)
    r_floodfill(y, x, new_clr, old_clr);
    }

    To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).


    well this is ok improvement for consideration - i hovever resolved a
    problem even in 3 ways as you could note reading more carefully
    1) put a stack to 100 MB and forget
    2) ui wrote strightforward iteretive version (in draft) (this with
    AddPixel(.. )
    3) i noticed that the best method would be to introduce so called call
    queue in c (probably best solution imo)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to All on Sat Mar 23 14:43:49 2024
    Malcolm McLean <[email protected]> writes:


    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited whilst stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Scott Lurndal on Sat Mar 23 11:48:16 2024
    [email protected] (Scott Lurndal) writes:

    Malcolm McLean <[email protected]> writes:

    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited while stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    The sizes of both heaps and stacks are bounded, because
    pointers have a fixed number of bits. Certainly these
    sizes can be very very large, but they are not unbounded.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sun Mar 24 19:33:52 2024
    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 21:40:22 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Tim Rentsch <[email protected]> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.
    [code]

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.

    Slower with some shapes, faster in others.

    In my small test suit I found no cases where this specific code is measurably faster than code of Malcolm.

    My test cases include pixel fields of 32k by 32k, with for
    example filling the entire field starting at the center point.
    Kind of a stress test but it turned up some interesting results.

    I did find one case in which they are approximately equal. I call
    it "slalom shape" and it's more or less designed to be the worst
    case for algorithms that are trying to speed themselves by take
    advantage of straight lines.
    The slalom shape is generated by following code:
    [code]

    Thanks, I may try that.

    In any case
    the code was written for clarity of presentation, with
    no attention paid to low-level performance.

    Yes, your code is easy to understand. Could have been easier
    still if persistent indices had more meaningful names.

    I have a different view on that question. However I take your
    point.

    In other post I showed optimized variant of your algorithm: -
    4-neighbors loop unrolled. Majority of the speed up come not from unrolling itself, but from specialization of in-rectangle check
    enabled by unroll.
    - Todo queue implemented as circular buffer.
    - Initial size of queue increased.
    This optimized variant is more competitive with 'line-grabby'
    algorithms in filling solid shapes and faster than them in
    'slalom' case.

    Yes, unrolling is an obvious improvement. I deliberately chose a
    simple (and non-optimized) method to make it easier to see how it
    works. Simple optimizations are left as an exercise for the
    reader. :)

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).


    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar limit
    can be proven theoretically.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sun Mar 24 20:27:58 2024
    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:


    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.


    To be fair, none of presented algorithms is abysmally slow.
    When compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.
    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it happens
    it's because of problematic implementation rather because of
    differences in algorithms.
    Even original naive recursive algorithm is not too slow when
    implemented in optimized asm - 2.2x slower than the fastest for solid
    square shape and closer than that for other shapes.

    The big difference between algorithms is not a speed, but amount of
    auxiliary memory used in the worst case. Your algorithm appears to be
    the best in that department, Malcolm's algorithm it's also quite good
    and all others (plain recursion, stacks, my deferred stack, all my
    cross variants, lines-oriented recursion of : Peter 'Shaggy'
    Haywood) are a lot worse.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    For example, solid 200x200 image with starting point in the corner
    recolored by code presented in first Malcolm's post (not his own
    algorithm, but recursive algorithm that he presented as a reference
    point) on x86-64/gcc consumes 5,094,784 bytes of stack. After small modification (all non-changing parameters aggregated in structure
    and passed by reference) the footprint falls to 2,547,328 B.
    Coding the same algorithm (well, almost the same) in asm reduces it to
    32,0000 B. Coding it with explicit stack cuts it to 40,000 B. Now I
    didn't actually coded it, but I know how to compress explicit stack
    down to 2 bits per level of recursion. If implemented, it would be
    10,000B, i.e. comparable with much more economical algorithm of Malcolm
    and 512x smaller than original implementation of [well, almost] the
    same algorithm!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sun Mar 24 10:24:45 2024
    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    As for finding a worst case, try this (expressed in pseudo code):

    let pc = { width/2, height/2 }
    // assume pixel field 'field' starts out as all zeroes
    color 8 "legs" with the value '1' as follows:

    leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
    leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
    leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
    leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }

    leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
    leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
    leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
    leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }


    So the pixel field should look like this (with longer legs for a
    bigger pixel field), with '-' being 0 and '*' being 1:

    +-----------------------+
    | - - - - - - - - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - * * * * - * * * * - |
    | - - - - - - - - - - - |
    | - * * * * - * * * * - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - - - - - - - - |
    +-----------------------+

    Now start coloring at the center point with a new value
    of 2.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sun Mar 24 13:26:16 2024
    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When
    compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount of
    auxiliary memory used in the worst case. Your algorithm appears to be
    the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Tim Rentsch on Sun Mar 24 20:48:52 2024
    Tim Rentsch <[email protected]> writes:
    [email protected] (Scott Lurndal) writes:

    Malcolm McLean <[email protected]> writes:

    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited while stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    The sizes of both heaps and stacks are bounded, because
    pointers have a fixed number of bits. Certainly these
    sizes can be very very large, but they are not unbounded.

    I was referring to the term of art used in POSIX, where
    unlimited simply means that the operating system doesn't
    limit them (and as I pointed out, physical limits, including
    address space size (which is often only 48 bits, regardless
    of the 64-bit pointer space)) will dominate.

    $ ulimit -a
    address space limit (Kibytes) (-M) unlimited
    core file size (blocks) (-c) 0
    cpu time (seconds) (-t) unlimited
    data size (Kibytes) (-d) unlimited
    file size (blocks) (-f) unlimited
    locks (-x) unlimited

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Mon Mar 25 01:04:32 2024
    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    As for finding a worst case, try this (expressed in pseudo code):

    let pc = { width/2, height/2 }
    // assume pixel field 'field' starts out as all zeroes
    color 8 "legs" with the value '1' as follows:

    leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
    leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
    leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
    leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }

    leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
    leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
    leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
    leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }


    So the pixel field should look like this (with longer legs for a
    bigger pixel field), with '-' being 0 and '*' being 1:

    +-----------------------+
    | - - - - - - - - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - * * * * - * * * * - |
    | - - - - - - - - - - - |
    | - * * * * - * * * * - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - - - - - - - - |
    +-----------------------+

    Now start coloring at the center point with a new value
    of 2.


    Yes, I see. It is close to min(width,height)*4.
    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Mon Mar 25 01:28:44 2024
    On Sun, 24 Mar 2024 13:26:16 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When
    compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount of auxiliary memory used in the worst case. Your algorithm appears to
    be the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    You are right. At 1920*1080 except for few special patterns, your
    code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K.
    Auxiliary memory arrays of Malcolm are still quite small at these image
    sizes, but speed suffers.
    I wonder if it is a problem of algorithm or of implementation. Since I
    still didn't figure out his idea, I can't improve his implementation in
    order check it.

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
    form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite that it
    is not slow. Probably the stack has very good locality of reference.

    Here is the code:

    #include <stdlib.h>
    #include <stddef.h>

    typedef unsigned char Color;

    int floodfill4(
    Color *image,
    int width,
    int height,
    int x0,
    int y0,
    Color old,
    Color new)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
    return 0;

    const size_t w = width;
    Color* image_end = &image[w*height];

    size_t x = x0;
    Color* row = &image[w*y0];
    if (row[x] != old)
    return 0;

    const ptrdiff_t INITIAL_STACK_SZ = 256;
    unsigned char* stack = malloc(INITIAL_STACK_SZ*sizeof(*stack));
    if (!stack)
    return -1;
    unsigned char* sp = stack;
    unsigned char* end_stack = &stack[INITIAL_STACK_SZ];

    enum { ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN, ST_BEG };

    recursive_call:
    row[x] = new;
    if (sp==end_stack) {
    ptrdiff_t size = sp - stack;
    ptrdiff_t new_size = size+size/2;
    unsigned char* new_stack = realloc(stack, new_size *
    sizeof(*stack)); if (!new_stack) {
    free(stack);
    return -1;
    }
    stack = new_stack;
    sp = &stack[size];
    end_stack = &stack[new_size];
    }

    for (unsigned state = ST_BEG;;) {
    switch (state) {
    case ST_BEG:

    ++x;
    if (x != width) {
    if (row[x] == old) {
    *sp++ = ST_RIGHT; goto recursive_call; // recursive call
    }
    }
    case ST_RIGHT:
    --x;

    if (x > 0) {
    --x;
    if (row[x] == old) {
    *sp++ = ST_LEFT; goto recursive_call; // recursive call
    }
    case ST_LEFT:
    ++x;
    }

    if (row != image) {
    row -= w;
    if (row[x] == old) {
    *sp++ = ST_UP; goto recursive_call; // recursive call
    }
    case ST_UP:
    row += w;
    }

    row += w;
    if (row != image_end) {
    if (row[x] == old) {
    *sp++ = ST_DOWN; goto recursive_call; // recursive call
    }
    case ST_DOWN:
    }
    row -= w;
    break;
    }

    if (sp == stack)
    break;

    state = *--sp; // pop stack (back to caller)
    }

    free(stack);
    return 1; // done
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Tue Mar 26 17:52:18 2024
    On Mon, 25 Mar 2024 01:28:44 +0300
    Michael S <[email protected]> wrote:

    On Sun, 24 Mar 2024 13:26:16 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <[email protected]> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount
    of auxiliary memory used in the worst case. Your algorithm
    appears to be the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    You are right. At 1920*1080 except for few special patterns, your
    code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K. Auxiliary memory arrays of Malcolm are still quite small at these
    image sizes, but speed suffers.
    I wonder if it is a problem of algorithm or of implementation. Since I
    still didn't figure out his idea, I can't improve his implementation
    in order check it.

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
    form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite that
    it is not slow. Probably the stack has very good locality of
    reference.

    Here is the code:
    <snip>


    The most robust code that I found so far that performs well both with
    small pictures and with large and huge, is a variation on the same
    theme of explicit stack, may be, more properly called trace back.
    It operates on 2x2 squares instead of individual pixels.
    The worst case auxiliary memory footprint of this variant is rather big,
    up to picture_size/4 bytes. The code is *not* simple, but complexity
    appears to be necessary for robust performance with various shapes and
    sizes.

    Todo queue based variants have very low memory footprint and perform
    well for as long as recolored shape fits in the fast levels of
    cache hierarchy, but suffer sharp slowdown when shape grows beyond
    that. It seems, the problem of this algorithms is that the front
    of recoloring is interleaved and focus of processing jumps randomly
    across the front which leads to poor locality and to trashing of the
    cache. May be, for huge pictures some sort of priority queue will
    perform better than simple FIFO ? May be, implemented as binary heap? https://en.wikipedia.org/wiki/Binary_heap

    Thought are interesting, but it's unlikely that it could lead to faster
    code than one presented below.

    #include <stdlib.h>
    #include <stddef.h>

    typedef unsigned char Color;

    static __inline
    unsigned check_column(Color *row, size_t x, size_t w, Color *end_image,
    Color old)
    {
    unsigned b = row[x+0] == old ? 1<<0 : 0;
    if (row+w != end_image && row[x+w] == old)
    b |= 1 << 2;
    return b;
    }

    static __inline
    unsigned check_row(Color *row, size_t x, size_t w, Color old)
    {
    unsigned b = row[x+0] == old ? 1<<0 : 0;
    if (x+1 != w && row[x+1] == old)
    b |= 1 << 1;
    return b;
    }

    int floodfill4(
    Color *image,
    int width,
    int height,
    int x0,
    int y0,
    Color old,
    Color new)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
    return 0;

    const size_t w = width;

    size_t col0 = x0;
    Color* row0 = &image[w * y0];
    if (row0[col0] != old)
    return 0;

    int offs = 0;
    if (y0 & 1) {
    row0 -= w;
    offs = 2;
    }
    if (col0 & 1) {
    col0 -= 1;
    offs |= 1;
    }

    Color* end_image = &image[w * height];

    const ptrdiff_t INITIAL_STACK_SZ = 256;
    unsigned char* stack = malloc(INITIAL_STACK_SZ*sizeof(*stack));
    if (!stack)
    return -1;
    unsigned char* sp = stack;
    unsigned char* end_stack = &stack[INITIAL_STACK_SZ];

    enum {
    // state
    ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN,
    ST_BEG,
    STATE_BITS = 3,
    // mask
    MSK_B00 = 1 << 2, MSK_B01 = 1 << 3,
    MSK_B10 = 1 << 4, MSK_B11 = 1 << 5,
    MSK_B0x = MSK_B00 | MSK_B01,
    MSK_B1x = MSK_B10 | MSK_B11,
    MSK_Bx0 = MSK_B00 | MSK_B10,
    MSK_Bx1 = MSK_B01 | MSK_B11,
    MSK_Bxx = MSK_Bx0 | MSK_Bx1,
    MSK_BITS = MSK_Bxx,
    // from
    FROM_LEFT = 0 << 6,
    FROM_RIGHT = 1 << 6,
    FROM_UP = 2 << 6,
    FROM_DOWN = 3 << 6,
    FROM_BITS = 3 << 6,
    };

    unsigned bit_mask0 = check_row(row0, col0, w, old)*MSK_B00;
    if (row0+w != end_image)
    bit_mask0 |= check_row(row0+w, col0, w, old)*MSK_B10;
    static const unsigned char kill_diag_tab[4][2] = {
    {MSK_B01 | MSK_B10, ~MSK_B11},
    {MSK_B00 | MSK_B11, ~MSK_B10},
    {MSK_B00 | MSK_B11, ~MSK_B01},
    {MSK_B01 | MSK_B10, ~MSK_B00},
    };
    if ((bit_mask0 & kill_diag_tab[offs][0])==0)
    bit_mask0 &= kill_diag_tab[offs][1];

    for (int rep = 0; rep < 2; ++rep) {
    unsigned bit_mask = bit_mask0;
    Color* row = row0;
    size_t x = col0;
    unsigned from = rep == 0 ? FROM_DOWN : FROM_LEFT;

    recursive_call:
    if (bit_mask & MSK_B00) row[x+0] = new;
    if (bit_mask & MSK_B01) row[x+1] = new;
    if (bit_mask & MSK_B10) row[x+w+0] = new;
    if (bit_mask & MSK_B11) row[x+w+1] = new;

    if (sp==end_stack) {
    ptrdiff_t size = sp - stack;
    ptrdiff_t new_size = size+size/2;
    unsigned char* new_stack = realloc(stack, new_size *
    sizeof(*stack));
    if (!new_stack) {
    free(stack);
    return -1;
    }
    stack = new_stack;
    sp = &stack[size];
    end_stack = &stack[new_size];
    }

    for (unsigned state = ST_BEG;;) {
    switch (state) {
    case ST_BEG:

    if (from != FROM_RIGHT && bit_mask & MSK_Bx1) { // look right
    x += 2;
    if (x != w) {
    unsigned bx0 = check_column(row, x, w, end_image, old);
    if (bx0 & (bit_mask/MSK_B01)) {
    // recursive call
    *sp++ = bit_mask | from | ST_RIGHT;
    bit_mask = bx0*MSK_B00;
    x += 1;
    if (x != w) {
    unsigned bx1 = check_column(row, x, w, end_image, old);
    if (bx0 & bx1)
    bit_mask |= bx1*MSK_B01;
    }
    x -= 1;
    from = FROM_LEFT;
    goto recursive_call;
    }
    }
    case ST_RIGHT:
    x -= 2;
    }

    if (from != FROM_LEFT && bit_mask & MSK_Bx0) { // look left
    if (x > 0) {
    unsigned bx1 = check_column(row, x-1, w, end_image, old);
    if (bx1 & (bit_mask/MSK_B00)) {
    // recursive call
    *sp++ = bit_mask | from | ST_LEFT;
    bit_mask = bx1*MSK_B01;
    x -= 2;
    unsigned bx0 = check_column(row, x, w, end_image, old);
    if (bx0 & bx1)
    bit_mask |= bx0*MSK_B00;
    from = FROM_RIGHT;
    goto recursive_call;
    case ST_LEFT:
    x += 2;
    }
    }
    }

    if (from != FROM_UP && bit_mask & MSK_B0x) { // look up
    if (row != image) {
    row -= w;
    unsigned b1x = check_row(row, x, w, old);
    row -= w;
    if (b1x & (bit_mask/MSK_B00)) {
    // recursive call
    *sp++ = bit_mask | from | ST_UP;
    bit_mask = b1x*MSK_B10;
    unsigned b0x = check_row(row, x, w, old);
    if (b0x & b1x)
    bit_mask |= b0x*MSK_B00;
    from = FROM_DOWN;
    goto recursive_call;
    case ST_UP:
    }
    row += w*2;
    }
    }

    if (from != FROM_DOWN && bit_mask & MSK_B1x) { // look down
    row += w*2;
    if (row != end_image) {
    unsigned b0x = check_row(row, x, w, old);
    if (b0x & (bit_mask/MSK_B10)) {
    // recursive call
    *sp++ = bit_mask | from | ST_DOWN;
    bit_mask = b0x*MSK_B00;
    row += w;
    if (row != end_image) {
    unsigned b1x = check_row(row, x, w, old);
    if (b0x & b1x)
    bit_mask |= b1x*MSK_B10;
    }
    row -= w;
    from = FROM_UP;
    goto recursive_call;
    }
    }
    case ST_DOWN:
    row -= w*2;
    }
    break;
    }

    if (sp == stack)
    break;

    unsigned stack_val = *--sp; // pop stack (back to caller)
    state = stack_val & STATE_BITS;
    bit_mask = stack_val & MSK_BITS;
    from = stack_val & FROM_BITS;
    }
    }

    free(stack);
    return 1; // done
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Mar 28 23:04:36 2024
    Michael S <[email protected]> writes:

    [..various fill algorithms and how they scale..]

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
    form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite that it
    is not slow. Probably the stack has very good locality of reference.

    [algorithm]

    You are indeed a very clever fellow. I'm impressed.

    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    I see you have also done a revised algorithm based on the same
    idea, but more elaborate (to save on runtime footprint?).
    Still working on formulating a response to that one...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Scott Lurndal on Thu Mar 28 22:51:21 2024
    [email protected] (Scott Lurndal) writes:

    Tim Rentsch <[email protected]> writes:

    [email protected] (Scott Lurndal) writes:

    Malcolm McLean <[email protected]> writes:

    The convetional wisdom is the opposite, But here, conventional wisdom >>>>> fails. Because heaps are unlimited while stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    The sizes of both heaps and stacks are bounded, because
    pointers have a fixed number of bits. Certainly these
    sizes can be very very large, but they are not unbounded.

    I was referring to the term of art used in POSIX, where
    unlimited simply means that the operating system doesn't
    limit them [.. elaboration ..]

    The earlier sentence was confusing, as the sentence construction
    suggested "unlimited" was a general term rather than one with a
    specific meaning in POSIX. In any case thank you for the
    education.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Fri Mar 29 15:21:41 2024
    On Thu, 28 Mar 2024 23:04:36 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [..various fill algorithms and how they scale..]

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of
    original form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite
    that it is not slow. Probably the stack has very good locality of reference.

    [algorithm]

    You are indeed a very clever fellow. I'm impressed.


    Yes, the use of switch is clever :(
    It more resemble computed GO TO in old FORTRAN or indirect jumps in asm
    than idiomatic C switch. But it is a legal* C.


    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    If non-trivially different, why not?


    I see you have also done a revised algorithm based on the same
    idea, but more elaborate (to save on runtime footprint?).
    Still working on formulating a response to that one...

    The original purpose of enhancement was to amortize non-trivial and
    probably not very fast call stack emulation logic over more than one
    pixel. 2x2 just happens to be the biggest block that still has very
    simple in-block recoloring logic. ~4x reduction in the size of
    auxiliary memory is just a pleasant side effect.

    Exactly the same 4x reduction in memory size could have been achieved
    with single-pixel variant by using packed array for 2-bit state
    (==trace back) stack elements. But it would be the same or slower than
    original while the enhanced variant is robustly faster than original.

    After implementing the first enhancement I paid attention that at 4K
    size the timing (per pixel) for few of my test cases is significantly
    worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never looking back at immediate
    parent of current block. The info about location of the parent nicely
    fitted into remaining 2 bits of stack octet.


    ------
    * - the versions I posted are not exactly legal C; they are illegal C non-rejected by gcc. But they can be trivially made into legal C by
    adding semicolon after one of the case labels.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri Mar 29 23:58:26 2024
    Michael S <[email protected]> writes:

    On Thu, 28 Mar 2024 23:04:36 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [..various fill algorithms and how they scale..]

    One thing that I were not expecting at this bigger pictures, is
    good performance of simple recursive algorithm. Of course, not of
    original form of it, but of variation with explicit stack. For
    many shapes it has quite large memory footprint and despite that
    it is not slow. Probably the stack has very good locality of
    reference.

    [algorithm]

    You are indeed a very clever fellow. I'm impressed.

    Yes, the use of switch is clever :(

    In my view the cleverness is how "recursion" is accomplished by a
    means of a combination of using a stack to store a "return address"
    and restoring state by undoing a change rather than storing the
    old value. Using a switch() is just a detail (and to my way of
    thinking how the switch() is done here obscures the basic idea and
    makes the code harder to understand, but never mind that).

    It more resemble computed GO TO in old FORTRAN or indirect jumps
    in asm than idiomatic C switch. But it is a legal* C.

    I did program in FORTRAN briefly but don't remember ever using
    computed GO TO. And yes, I found that missing semicolon and put it
    back. Is there some reason you don't always use -pedantic? I
    pretty much always do.

    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    If non-trivially different, why not?

    I hope to soon but am unable to right now (and maybe for a week
    or so due to circumstances beyond my control). For sure the
    code is different; whether it is non-trivially different I
    leave for others to judge.

    I see you have also done a revised algorithm based on the same
    idea, but more elaborate (to save on runtime footprint?).
    Still working on formulating a response to that one...

    The original purpose of enhancement was to amortize non-trivial
    and probably not very fast call stack emulation logic over more
    than one pixel. 2x2 just happens to be the biggest block that
    still has very simple in-block recoloring logic. ~4x reduction in
    the size of auxiliary memory is just a pleasant side effect.

    Exactly the same 4x reduction in memory size could have been
    achieved with single-pixel variant by using packed array for 2-bit
    state (==trace back) stack elements. But it would be the same or
    slower than original while the enhanced variant is robustly faster
    than original.

    An alternate idea is to use a 64-bit integer for 32 "top of stack"
    elements, or up to 32 I should say, and a stack with 64-bit values.
    Just an idea, it may not turn out to be useful.

    The few measurements I have done don't show a big difference in
    performance between the two methods. But I admit I wasn't paying
    close attention, and like I said only a few patterns of filling were
    exercised.

    After implementing the first enhancement I paid attention that at
    4K size the timing (per pixel) for few of my test cases is
    significantly worse than at smaller images. So, I added another
    enhancement aiming to minimize cache trashing effects by never
    looking back at immediate parent of current block. The info about
    location of the parent nicely fitted into remaining 2 bits of
    stack octet.

    The idea of not going back to the originator (what you call the
    parent) is something I developed independently before looking at
    your latest code (and mostly I still haven't). Seems like a good
    idea.

    Two further comments.

    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?

    Two, and more important, the new algorithm still uses O(NxN) memory
    for an N by N pixel field. We really would like to get that down to
    O(N) memory (and of course run just as fast). Have you looked into
    how that might be done?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sat Mar 30 00:54:19 2024
    Michael S <[email protected]> writes:

    [...]

    The most robust code that I found so far that performs well both
    with small pictures and with large and huge, is a variation on the
    same theme of explicit stack, may be, more properly called trace
    back. It operates on 2x2 squares instead of individual pixels.

    The worst case auxiliary memory footprint of this variant is rather
    big, up to picture_size/4 bytes. The code is *not* simple, but
    complexity appears to be necessary for robust performance with
    various shapes and sizes.

    [...]

    I took a cursory look just now, after reading your other later
    posting. I think I have a general sense, especially in conjunction
    with the explanatory comments.

    I'm still hoping to find a method that is both fast and has
    good memory use, which is to say O(N) for an NxN pixel field.

    Something that would help is to have a library of test cases,
    by which I mean patterns to be colored, so that a set of
    methods could be tried, and timed, over all the patterns in
    the library. Do you have something like that? So far all
    my testing has been ad hoc.

    Incidentally, it looks like your code assumes X varies more rapidly
    than Y, so a "by row" order, whereas my code assumes Y varies more
    rapidly than X, a "by column" order. The difference doesn't matter
    as long as the pixel field is square and the test cases either are
    symmetric about the X == Y axis or duplicate a non-symmetric pattern
    about the X == Y axis. I would like to be able to run comparisons
    between different methods and get usable results without having
    to jump around because of different orientations. I'm not sure
    how to accommodate that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sat Mar 30 21:15:06 2024
    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <[email protected]> wrote:


    I did program in FORTRAN briefly but don't remember ever using
    computed GO TO. And yes, I found that missing semicolon and put it
    back. Is there some reason you don't always use -pedantic? I
    pretty much always do.



    Just a habit.
    In "real" work, as opposed to hobby, I use gcc almost exclusively for
    small embedded targets and quite often with 3-rd party libraries in
    source form. In such environment rising warnings level above -Wall
    would be counterproductive, because it would be hard to see relevant
    warning behind walls of false alarms.
    May be, for hobby, where I have full control on everything, switching
    to -Wpedantic is not a bad idea.



    An alternate idea is to use a 64-bit integer for 32 "top of stack"
    elements, or up to 32 I should say, and a stack with 64-bit values.
    Just an idea, it may not turn out to be useful.


    That's just a detail of how to do pack/unpack with minimal
    overhead. It does not change the principle that 'packed' version would
    be less memory hungry but on modern PC with GBs of RAM it will not be
    faster than original.
    Memory footprint can directly affect speed when access patterns have
    poor locality or when the rate of access exceeds 10-20 GB/s. In our
    case locality of stack access is very good and the rate of stack
    access, even on ultra fast processor, is less than 1 GB/s.


    The few measurements I have done don't show a big difference in
    performance between the two methods. But I admit I wasn't paying
    close attention, and like I said only a few patterns of filling were exercised.

    After implementing the first enhancement I paid attention that at
    4K size the timing (per pixel) for few of my test cases is
    significantly worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never
    looking back at immediate parent of current block. The info about
    location of the parent nicely fitted into remaining 2 bits of
    stack octet.

    The idea of not going back to the originator (what you call the
    parent) is something I developed independently before looking at
    your latest code (and mostly I still haven't). Seems like a good
    idea.


    I call it a principle of Lot's wife.
    That is yet another reason to not grow blocks above 2x2.
    For bigger blocks it does not apply.


    Two further comments.

    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?


    On my 11 y.o. and not top-of-the-line even then home PC for 4K
    image (3840 x 2160) with cross-in-cross shape that I took from one of
    your previous post, it is 2.43 times faster.
    I don't remember how it compares on more modern systems. Anyway, right
    now I have no test systems more modern than 3 y.o. Zen3.


    Two, and more important, the new algorithm still uses O(NxN) memory
    for an N by N pixel field. We really would like to get that down to
    O(N) memory (and of course run just as fast). Have you looked into
    how that might be done?

    Using this particular principle of not saving (x,y) in auxiliary
    storage, I don't believe that it is possible to have a footprint
    smaller than O(W*H).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sat Mar 30 11:59:03 2024
    Michael S <[email protected]> writes:

    On Thu, 28 Mar 2024 23:04:36 -0700
    Tim Rentsch <[email protected]> wrote:

    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    If non-trivially different, why not?

    Here is the code:

    void
    stack_fill( UI w0, UI h0, UC pixels[], Point p0, UC old, UC new ){
    U64 w = ( assert( w0 > 0 ), w0 );
    U64 h = ( assert( h0 > 0 ), h0 );
    U64 px = ( assert( p0.x < w ), p0.x );
    U64 py = ( assert( p0.y < h ), p0.y );

    UC *x0 = ( assert( pixels ), pixels );
    UC *x = x0 + px*h;
    UC *xm = x0 + h*w - h;

    U64 y0 = 0;
    U64 y = py;
    U64 ym = h-1;

    UC *s0 = malloc( sizeof *s0 );
    UC *s = s0;
    UC *sn = s0 ? s0+1 : s0;

    if( s0 && x[y] == old ) do {
    x[y] = new;
    if( s == sn ){
    U64 s_offset = s - s0;
    U64 n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }

    if( y < ym && x[y+1] == old ){
    y += 1, *s++ = 2; continue; UNDO_UP:
    y -= 1;
    }
    if( y > y0 && x[y-1] == old ){
    y -= 1, *s++ = 3; continue; UNDO_DOWN:
    y += 1;
    }
    if( x < xm && y[x+h] == old ){
    x += h, *s++ = 0; continue; UNDO_LEFT:
    x -= h;
    }
    if( x > x0 && y[x-h] == old ){
    x -= h, *s++ = 1; continue; UNDO_RIGHT:
    x += h;
    }

    if( s == s0 ) break;

    switch( *--s & 3 ){
    case 0: goto UNDO_LEFT;
    case 1: goto UNDO_RIGHT;
    case 2: goto UNDO_UP;
    case 3: goto UNDO_DOWN;
    }
    } while( 1 );

    free( s0 );
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sat Mar 30 21:26:57 2024
    On Sat, 30 Mar 2024 00:54:19 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    The most robust code that I found so far that performs well both
    with small pictures and with large and huge, is a variation on the
    same theme of explicit stack, may be, more properly called trace
    back. It operates on 2x2 squares instead of individual pixels.

    The worst case auxiliary memory footprint of this variant is rather
    big, up to picture_size/4 bytes. The code is *not* simple, but
    complexity appears to be necessary for robust performance with
    various shapes and sizes.

    [...]

    I took a cursory look just now, after reading your other later
    posting. I think I have a general sense, especially in conjunction
    with the explanatory comments.

    I'm still hoping to find a method that is both fast and has
    good memory use, which is to say O(N) for an NxN pixel field.

    Something that would help is to have a library of test cases,
    by which I mean patterns to be colored, so that a set of
    methods could be tried, and timed, over all the patterns in
    the library. Do you have something like that? So far all
    my testing has been ad hoc.


    I am not 100% sure about the meaning of 'ad hoc', but I'd guess that
    mine are ad hoc too. Below are shapes that I use apart from solid
    rectangles. I run them at 5 sizes: 25x19, 200x200, 1280x720, 1920x1080, 3840x2160. That is certainly not enough for correction tests, but feel
    that it is sufficient for speed tests.

    static void make_standing_snake(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    for (int y = 0; y < height; ++y) {
    unsigned char* p = &image[y*width];
    if (y % 2 == 0) {
    memset(p, pen_c, width);
    } else {
    memset(p, background_c, width);
    if (y % 4 == 1)
    p[width-1] = pen_c;
    else
    p[0] = pen_c;
    }
    }
    }

    static void make_prostrate_snake(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    memset(image, background_c, sizeof(*image)*width*height);
    // vertical bars
    for (int y = 0; y < height; ++y)
    for (int x = 0; x < width; x += 2)
    image[y*width+x] = pen_c;

    // connect bars at top
    for (int x = 3; x < width; x += 4)
    image[x] = pen_c;

    // connect bars at bottom
    for (int x = 1; x < width; x += 4)
    image[(height-1)*width+x] = pen_c;
    }


    static void make_slalom(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    const int n_col = width/3;
    const int n_row = (height-3)/4;

    // top row
    // P B B P P P
    for (int col = 0; col < n_col; ++col) {
    unsigned char c = (col & 1)==0 ? background_c : pen_c;
    image[col*3] = pen_c; image[col*3+1] = c; image[col*3+2] = c;
    }
    for (int x = n_col*3; x < width; ++x)
    image[x] = image[n_col*3-1];

    // main image: consists of 3x4 blocks filled by following pattern
    // P B B
    // P P B
    // B P B
    // P P B
    for (int row = 0; row < n_row; ++row) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[(row*4+1)*width+col*3];
    p[0] = pen_c; p[1] = background_c; p[2] = background_c; p
    += width; p[0] = pen_c; p[1] = pen_c; p[2] =
    background_c; p += width; p[0] = background_c; p[1] = pen_c;
    p[2] = background_c; p += width; p[0] = pen_c; p[1] = pen_c;
    p[2] = background_c; p += width; }
    }

    // near-bottom rows
    // P B B
    for (int y = n_row*4+1; y < height-1; ++y) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[y*width+col*3];
    p[0] = pen_c; p[1] = background_c; p[2] = background_c;
    }
    }

    // bottom row - all P
    // P P P P B B
    unsigned char *b_row = &image[width*(height-1)];
    for (int col = 0; col < n_col; ++col) {
    unsigned char c = (col & 1)==1 ? background_c : pen_c;
    b_row[col*3+0] = pen_c;
    b_row[col*3+1] = c;
    b_row[col*3+2] = c;
    }
    for (int x = n_col*3; x < width; ++x)
    b_row[x] = b_row[n_col*3-1];

    // rightmost columns
    for (int x = n_col*3; x < width; ++x) {
    for (int y = 1; y < height-1; ++y)
    image[y*width+x] = background_c;
    }
    }

    static void make_slalom90(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    const int n_col = (width-3)/4;
    const int n_row = height/3;

    // leftmost column
    // P
    // B
    // B
    // P
    // P
    // P
    for (int row = 0; row < n_row; ++row) {
    unsigned char c = (row & 1)==0 ? background_c : pen_c;
    image[(row*3+0)*width] = pen_c;
    image[(row*3+1)*width] = c;
    image[(row*3+2)*width] = c;
    }
    for (int y = n_row*3; y < height; ++y)
    image[y*width] = image[(n_row*3-1)*width];

    // main image: consists of 4x3 blocks filled by following pattern
    // P P B P
    // B P P P
    // B B B B
    for (int row = 0; row < n_row; ++row) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[(row*3*width)+(col*4+1)];
    p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
    p[3] = pen_c; p += width; p[0] = background_c; p[1] = pen_c;
    p[2] = pen_c; p[3] = pen_c; p += width; p[0] = background_c;
    p[1] = background_c; p[2] = background_c; p[3] = background_c; }
    }

    // near-rightmost column
    // P
    // B
    // B
    for (int row = 0; row < n_row; ++row) {
    for (int x = n_col*4+1; x < width-1; ++x) {
    unsigned char* p = &image[row*width*3+x];
    p[0*width] = pen_c;
    p[1*width] = background_c;
    p[2*width] = background_c;
    }
    }

    // rightmost column
    // P
    // P
    // P
    // P
    // B
    // B
    unsigned char *r_col = &image[width-1];
    for (int row = 0; row < n_row; ++row) {
    unsigned char c = (row & 1)==1 ? background_c : pen_c;
    r_col[(row*3+0)*width] = pen_c;
    r_col[(row*3+1)*width] = c;
    r_col[(row*3+2)*width] = c;
    }
    for (int y = n_row*3; y < height; ++y)
    r_col[y*width] = r_col[(n_row*3-1)*width];

    // bottom rows
    for (int y = n_row*3; y < height; ++y) {
    for (int x = 1; x < width-1; ++x)
    image[y*width+x] = background_c;
    }
    }

    static void make_crosss_in_cross(
    unsigned char* image,
    int width,
    int height,
    int xc,
    int yc,
    unsigned char background_c,
    unsigned char pen_c)
    {
    memset(image, pen_c, width*height);

    if (xc > 1 && xc+1 < width-1 && yc > 1 && yc+1 < height-1) {
    memset(&image[(yc-1)*width+1], background_c, xc-1);
    memset(&image[(yc+1)*width+1], background_c, xc-1);
    memset(&image[(yc-1)*width+xc+1], background_c, width-xc-2);
    memset(&image[(yc+1)*width+xc+1], background_c, width-xc-2);
    for (int y = 1; y < yc; ++y) {
    image[y*width+xc-1] = background_c;
    image[y*width+xc+1] = background_c;
    }
    for (int y = yc+1; y < height-1; ++y) {
    image[y*width+xc-1] = background_c;
    image[y*width+xc+1] = background_c;
    }
    }
    }


    Incidentally, it looks like your code assumes X varies more rapidly
    than Y, so a "by row" order, whereas my code assumes Y varies more
    rapidly than X, a "by column" order.

    It is not so much about what I assume as about what is cheaper for
    CPU hardware.

    The difference doesn't matter
    as long as the pixel field is square and the test cases either are
    symmetric about the X == Y axis or duplicate a non-symmetric pattern
    about the X == Y axis. I would like to be able to run comparisons
    between different methods and get usable results without having
    to jump around because of different orientations. I'm not sure
    how to accommodate that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Sun Mar 31 10:54:38 2024
    On Sat, 30 Mar 2024 21:15:06 +0300
    Michael S <[email protected]> wrote:

    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <[email protected]> wrote:


    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?


    On my 11 y.o. and not top-of-the-line even then home PC for 4K
    image (3840 x 2160) with cross-in-cross shape that I took from one of
    your previous post, it is 2.43 times faster.
    I don't remember how it compares on more modern systems. Anyway, right
    now I have no test systems more modern than 3 y.o. Zen3.



    I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
    Zen3 (EPYC 7543P).
    Here I no longer see significant drop in speed of the 1x1 variant at 4K
    size, but I still see that more complicated variant provides nice speed
    up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Fri Apr 5 17:30:33 2024
    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.


    It seems that in worst case the strict FIFO algorithm is the same as
    the rest of them, i.e. O(NN) where NN is the number of re-colored
    points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
    1920x1080.

    static void make_fractal_tree_recursive(
    unsigned char* image,
    int width,
    int nx,
    int ny,
    unsigned char pen_c)
    {
    if (nx < 3 && ny < 3) {
    // small rectangle - solid fill
    for (int y = 0; y < ny; ++y)
    for (int x = 0; x < nx; ++x)
    image[width*y+x] = pen_c;
    return;
    }
    if (nx >= ny) {
    int xc = (nx-1)/2;
    if (xc - 1 > 0) { // left sub-plot
    make_fractal_tree_recursive(image, width, xc - 1, ny, pen_c);
    }
    if (xc + 2 < nx) { // right sub-plot
    make_fractal_tree_recursive(&image[xc+2], width,
    nx - (xc + 2), ny, pen_c);
    }
    // draw vertical cross
    for (int y = 0; y < ny; ++y)
    image[width*y+xc] = pen_c;
    int yc = (ny-1)/2;
    image[width*yc+xc-1] = pen_c;
    image[width*yc+xc+1] = pen_c;
    } else {
    int yc = (ny-1)/2;
    if (yc - 1 > 0) { // upper sub-plot
    make_fractal_tree_recursive(image, width, nx, yc - 1, pen_c);
    }
    if (yc + 2 < ny) { // lower sub-plot
    make_fractal_tree_recursive(&image[(yc+2)*width], width, nx,
    ny -(yc + 2), pen_c);
    }
    // draw horizontal cross
    for (int x = 0; x < nx; ++x)
    image[width*yc+x] = pen_c;
    int xc = (nx-1)/2;
    image[width*(yc-1)+xc] = pen_c;
    image[width*(yc+1)+xc] = pen_c;
    }
    }

    static void make_fractal_tree(
    unsigned char* image,
    int width,
    int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    memset(image, background_c, width*height);
    make_fractal_tree_recursive(image, width, width, height, pen_c);
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Apr 9 01:00:34 2024
    Michael S <[email protected]> writes:

    On Sat, 30 Mar 2024 00:54:19 -0700
    Tim Rentsch <[email protected]> wrote:
    [...]
    Something that would help is to have a library of test cases,
    by which I mean patterns to be colored, so that a set of
    methods could be tried, and timed, over all the patterns in
    the library. Do you have something like that? So far all
    my testing has been ad hoc.

    I am not 100% sure about the meaning of 'ad hoc', but I'd guess
    that mine are ad hoc too. Below are shapes that I use apart from
    solid rectangles. I run them at 5 sizes: 25x19, 200x200,
    1280x720, 1920x1080, 3840x2160. That is certainly not enough for
    correction tests, but feel that it is sufficient for speed tests.

    [code]

    I got these, thank you.

    Here is a pattern generating function I wrote for my own
    testing. Disclaimer: slightly changed from my original
    source, hopefully any errors inadvertently introduced can
    be corrected easily. Also, it uses the value 0 for the
    background and the value 1 for the pattern to be colored.

    #include <math.h>
    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Pixel;

    extern void
    ellipse_with_hole( Pixel *field, unsigned w, unsigned h ){
    size_t i, j;
    double wc = w/2, hc = h/2;

    double a = (w > h ? wc : hc) -1;
    double b = (w > h ? hc : wc) -1;

    double b3 = 1+6*b/8;
    double radius = b/2.5;
    double cx = w > h ? wc : b3+1;
    double cy = w > h ? b3+1 : hc;

    double focus = sqrt( a*a - b*b );
    double f1x = w > h ? wc - focus : wc;
    double f1y = w > h ? hc : hc - focus;
    double f2x = w > h ? wc + focus : wc;
    double f2y = w > h ? hc : hc + focus;

    memset( field, 0, w*h );

    for( i = 0; i < w; i++ ){
    for( j = 0; j < h; j++ ){
    double dx = i - cx, dy = j - cy;
    double r2 = radius * radius;
    if( dx * dx + dy*dy <= r2 ) continue;
    double dx1 = i - f1x, dy1 = j - f1y;
    double dx2 = i - f2x, dy2 = j - f2y;
    double sum2 = a*2;
    double d1 = sqrt( dx1*dx1 + dy1*dy1 );
    double d2 = sqrt( dx2*dx2 + dy2*dy2 );
    if( d1 + d2 > 2*a ) continue;
    field[ i+j*w ] = 1;
    }}
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Apr 9 01:55:39 2024
    Michael S <[email protected]> writes:

    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <[email protected]> wrote:

    I did program in FORTRAN briefly but don't remember ever using
    computed GO TO. And yes, I found that missing semicolon and put it
    back. Is there some reason you don't always use -pedantic? I
    pretty much always do.

    Just a habit.
    In "real" work, as opposed to hobby, I use gcc almost exclusively for
    small embedded targets and quite often with 3-rd party libraries in
    source form. In such environment rising warnings level above -Wall
    would be counterproductive, because it would be hard to see relevant
    warning behind walls of false alarms.
    May be, for hobby, where I have full control on everything, switching
    to -Wpedantic is not a bad idea.

    My experience with third party libraries is that sometimes they use
    extensions, probably mostly gcc-isms. Not much to be done in that
    case. Of course turning on -pedantic could be done selectively.

    It might be worth an experiment of turning off -Wall while turning
    on -pedantic to see how big or how little the problem is.

    The idea of not going back to the originator (what you call the
    parent) is something I developed independently before looking at
    your latest code (and mostly I still haven't). Seems like a good
    idea.

    I call it a principle of Lot's wife.
    That is yet another reason to not grow blocks above 2x2.
    For bigger blocks it does not apply.

    Here is an updated version of my "stacking" code. On my test
    system (and I don't even know exactly what CPU it has, probably
    about 5 years old) this code runs about 30% faster than your 2x2
    version, averaged across all patterns and all sizes above the
    smallest ones (25x19 and 19x25).

    #include <assert.h>
    #include <stdlib.h>

    typedef unsigned char UC, Color;
    typedef size_t Index, Count;
    typedef struct { Index x, y; } Point;

    extern Count
    stack_plus( UC *field, Index w, Index h, Point p0, Color old, Color new ){
    Index px = ( assert( p0.x < w ), p0.x );
    Index py = ( assert( p0.y < h ), p0.y );

    Index x0 = 0;
    Index x = px;
    Index xm = w-1;

    UC *y0 = field;
    UC *y = y0 + py*w;
    UC *ym = y0 + h*w - w;

    UC *s0 = malloc( 8 * sizeof *s0 );
    UC *s = s0;
    UC *sn = s0 ? s0+8 : s0;

    Count r = 0;

    if( s0 ) goto START_FOUR;

    while( s != s0 ){
    switch( *--s & 15 ){
    case 0: goto UNDO_START_LEFT;
    case 1: goto UNDO_START_RIGHT;
    case 2: goto UNDO_START_UP;
    case 3: goto UNDO_START_DOWN;

    case 4: goto UNDO_LEFT_DOWN;
    case 5: goto UNDO_LEFT_LEFT;
    case 6: goto UNDO_LEFT_UP;

    case 7: goto UNDO_UP_LEFT;
    case 8: goto UNDO_UP_UP;
    case 9: goto UNDO_UP_RIGHT;

    case 10: goto UNDO_RIGHT_UP;
    case 11: goto UNDO_RIGHT_RIGHT;
    case 12: goto UNDO_RIGHT_DOWN;

    case 13: goto UNDO_DOWN_RIGHT;
    case 14: goto UNDO_DOWN_DOWN;
    case 15: goto UNDO_DOWN_LEFT;
    }

    START_FOUR:
    if( y[x] != old ) continue;
    y[x] = new; r++;
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 0; goto START_LEFT; UNDO_START_LEFT:
    x -= 1;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 1; goto START_RIGHT; UNDO_START_RIGHT:
    x += 1;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 2; goto START_UP; UNDO_START_UP:
    y -= w;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 3; goto START_DOWN; UNDO_START_DOWN:
    y += w;
    }
    continue;

    START_LEFT:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 5; goto START_LEFT; UNDO_LEFT_LEFT:
    x -= 1;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 4; goto START_DOWN; UNDO_LEFT_DOWN:
    y += w;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 6; goto START_UP; UNDO_LEFT_UP:
    y -= w;
    }
    continue;

    START_UP:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 7; goto START_LEFT; UNDO_UP_LEFT:
    x -= 1;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 9; goto START_RIGHT; UNDO_UP_RIGHT:
    x += 1;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 8; goto START_UP; UNDO_UP_UP:
    y -= w;
    }
    continue;

    START_RIGHT:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 11; goto START_RIGHT; UNDO_RIGHT_RIGHT:
    x += 1;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 10; goto START_UP; UNDO_RIGHT_UP:
    y -= w;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 12; goto START_DOWN; UNDO_RIGHT_DOWN:
    y += w;
    }
    continue;

    START_DOWN:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 13; goto START_RIGHT; UNDO_DOWN_RIGHT:
    x += 1;
    }
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 15; goto START_LEFT; UNDO_DOWN_LEFT:
    x -= 1;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 14; goto START_DOWN; UNDO_DOWN_DOWN:
    y += w;
    }
    continue;

    }

    return free( s0 ), r;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Apr 9 02:32:31 2024
    Michael S <[email protected]> writes:

    On Sat, 30 Mar 2024 21:15:06 +0300
    Michael S <[email protected]> wrote:

    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <[email protected]> wrote:

    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?

    On my 11 y.o. and not top-of-the-line even then home PC for 4K
    image (3840 x 2160) with cross-in-cross shape that I took from one of
    your previous post, it is 2.43 times faster.
    I don't remember how it compares on more modern systems. Anyway, right
    now I have no test systems more modern than 3 y.o. Zen3.

    I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
    Zen3 (EPYC 7543P).
    Here I no longer see significant drop in speed of the 1x1 variant at 4K
    size, but I still see that more complicated variant provides nice speed
    up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.

    On my test system the numbers are closer and also more evenly
    balanced: ratios range from about 0.70 to about 1.40, roughly
    evenly split with the 2x2 version somewhat better. There was
    one outlier at approximately 1.48. More precisely, the ratios
    have an average of 1.06 (which means the 1x1 version is about
    6 percent slower on average), with a standard deviation of 0.21.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Apr 10 19:47:11 2024
    Michael S <[email protected]> writes:

    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds
    min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    Before I do anything else I should correct a bug in my earlier
    FIFO algorithm. The initialization of the variable jx should
    read

    Index const jx = used*3 < open ? k : j+open/3 &m;

    rather than what it used to. (The type may have changed but that
    is incidental; what matters is the value of the initializing
    expression.) I don't know what I was thinking when I wrote the
    previous version, it's just completely wrong.

    It seems that in worst case the strict FIFO algorithm is the same as
    the rest of them, i.e. O(NN) where NN is the number of re-colored
    points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
    1920x1080.

    I agree, the empirical evidence here and in my own tests is quite
    compelling.

    That said, the constant factor for the FIFO algorithm is lower
    than the stack-based algorithms, even taking into account the
    difference in sizes for queue and stack elements. Moreover cases
    where FIFO algorithms are O( NxN ) are unusual and sparse,
    whereas the stack-based algorithms tend to use a lot of memory
    in lots of common and routine cases. On the average FIFO
    algorithms typically use a lot less memory (or so I conjecture).

    [code to generate fractal tree pattern]

    Thank you for this. I incorporated it into my set of test
    patterns more or less as soon as it was posted.

    Now that I have taken some time to play around with different
    algorithms and have been more systematic in doing speed
    comparisons between different algorithms, on different patterns,
    and with a good range of sizes, I have some general thoughts
    to offer.

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    I've been playing around with a more elaborate, mostly FIFO
    method, in hopes of getting something that offers the best
    of both worlds. The results so far are encouraging, but a
    fair amount of tuning has been necessary (and perhaps more
    still is), and comparisons have been done on just the one
    test server I have available. So I don't know how well it
    would hold up on other hardware, including especially more
    recent hardware. Under these circumstances I feel it is
    premature to post actual code, especially since the code
    is still in flux.

    This topic has been more interesting that I was expecting, and
    also more challenging. I have a strong rule against writing
    functions more than about 60 lines long. For the problem of
    writing an acceptably quick flood-fill algorithm, I think it would
    at the very least be a lot of work to write code to do that while
    still observing a limit on function length of even 100 lines, let
    alone 60.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Thu Apr 11 15:20:33 2024
    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack,
    my intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds
    min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    Before I do anything else I should correct a bug in my earlier
    FIFO algorithm. The initialization of the variable jx should
    read

    Index const jx = used*3 < open ? k : j+open/3 &m;


    I lost track, sorry. I can not find your code that contains line
    similar to this.
    Can you point to specific post?

    rather than what it used to. (The type may have changed but that
    is incidental; what matters is the value of the initializing
    expression.) I don't know what I was thinking when I wrote the
    previous version, it's just completely wrong.

    It seems that in worst case the strict FIFO algorithm is the same as
    the rest of them, i.e. O(NN) where NN is the number of re-colored
    points. Below is an example of the shape for which I measured
    memory consumption for 3840x2160 image almost exactly 4x as much as
    for 1920x1080.

    I agree, the empirical evidence here and in my own tests is quite
    compelling.


    BTW, I am no longer agree with myself about "the rest of them".
    By now, I know at least one method that is O(W*log(H)). It is even
    quite fast for majority of my test shapes. Unfortunately, [in its
    current form] it is abysmally slow (100x) for minority of tests.
    [In it's current form] it has other disadvantages as well like
    consuming non-trivial amount of memory when handling small spot in the
    big image. But that can be improved. I am less sure that worst-case
    speed can be improved enough to make it generally acceptable.

    I think, I said enough for you to figure out a general principle of
    this algorithm. I don't want to post code here before I try few
    improvements.

    That said, the constant factor for the FIFO algorithm is lower
    than the stack-based algorithms, even taking into account the
    difference in sizes for queue and stack elements. Moreover cases
    where FIFO algorithms are O( NxN ) are unusual and sparse,
    whereas the stack-based algorithms tend to use a lot of memory
    in lots of common and routine cases. On the average FIFO
    algorithms typically use a lot less memory (or so I conjecture).

    [code to generate fractal tree pattern]

    Thank you for this. I incorporated it into my set of test
    patterns more or less as soon as it was posted.

    Now that I have taken some time to play around with different
    algorithms and have been more systematic in doing speed
    comparisons between different algorithms, on different patterns,
    and with a good range of sizes, I have some general thoughts
    to offer.

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.


    Indeed, with solid shapes it uses more memory. But at least in my tests
    on my hardware with this sort of shapes it is easily faster than
    anything else. The difference vs the best of the rest is especially big
    at 4K images on AMD Zen3 based hardware, but even on Intel Skylake which generally serves as equalizer between different algorithms, the speed
    advantage of 2x2 stack is significant.

    I've been playing around with a more elaborate, mostly FIFO
    method, in hopes of getting something that offers the best
    of both worlds. The results so far are encouraging, but a
    fair amount of tuning has been necessary (and perhaps more
    still is), and comparisons have been done on just the one
    test server I have available. So I don't know how well it
    would hold up on other hardware, including especially more
    recent hardware. Under these circumstances I feel it is
    premature to post actual code, especially since the code
    is still in flux.

    This topic has been more interesting that I was expecting, and
    also more challenging.

    That's not the first time in my practice where problems with simple
    formulation begots interesting challenges.
    Didn't Donald Knuth wrote 300 or 400 pages about sorting and still
    ended up quite far away from exhausting the topic?

    I have a strong rule against writing
    functions more than about 60 lines long. For the problem of
    writing an acceptably quick flood-fill algorithm, I think it would
    at the very least be a lot of work to write code to do that while
    still observing a limit on function length of even 100 lines, let
    alone 60.

    So why not break it down to smaller pieces ?
    Myself, I have no rules. In my real work I am quite happy with
    dispatchers of network messages that are 250-300 lines long. But if I
    had this sort of rules, I'd certainly decompose.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 21:06:38 2024
    Michael S <[email protected]> writes:

    (I'm replying in pieces.)

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Before I do anything else I should correct a bug in my earlier
    FIFO algorithm. The initialization of the variable jx should
    read

    Index const jx = used*3 < open ? k : j+open/3 &m;

    I lost track, sorry. I can not find your code that contains line
    similar to this.
    Can you point to specific post?

    Easier for me just to repost the corrected algorithm. The
    type UC is an unsigned char, the types Index and Count are
    size_t (or maybe unsigned long long), the type U32 is a
    32-bit unsigned type.

    Please excuse any minor glitches, I have done some hand
    editing to take out various bits of diagnostic code.

    extern Count
    fifo_fill( UC *field, Index w, Index h, Point p0, UC old, UC new ){
    Index const xm = w-1;
    Index const ym = h-1;

    Index j = 0;
    Index k = 0;
    Index n = 1u << 10;
    Index m = n-1;
    U32 *todo = malloc( n * sizeof *todo );
    Index x = p0.x;
    Index y = p0.y;

    if( !todo || x >= w || y >= h || field[ x+y*w ] != old ) return 0;

    todo[ k++ ] = x<<16 | y;

    while( j != k ){
    Index used = j < k ? k-j : k+n-j;
    Index open = n - used;
    if( open < used/16 ){
    Index new_n = n*2;
    Index new_m = new_n-1;
    Index new_j = j < k ? j : j+n;
    U32 *t = realloc( todo, new_n * sizeof *t );
    if( ! t ) break;
    if( j != new_j ) memcpy( t+new_j, t+j, (n-j) * sizeof *t );
    todo = t, n = new_n, m = new_m, j = new_j, open = n-used;
    }
    assert( (k-j&m) == used && open+used == n );

    Index const jx = used*3 < open ? k : j+open/3 &m; // here it is!
    while( j != jx ){
    if( (k-j&m) > mm ) mm = k-j&m;
    U32 p = todo[j]; j = j+1 &m;
    x = p >> 16, y = p & 0xFFFF;
    if( x > 0 && field[ x-1 + y*w ] == old ){
    todo[k] = x-1<<16 | y, k = k+1&m, field[ x-1 + y*w ] = new;
    }
    if( y > 0 && field[ x + (y-1)*w ] == old ){
    todo[k] = x<<16 | y-1, k = k+1&m, field[ x + (y-1)*w ] = new;
    }
    if( x < xm && field[ x+1 + y*w ] == old ){
    todo[k] = x+1<<16 | y, k = k+1&m, field[ x+1 + y*w ] = new;
    }
    if( y < ym && field[ x + (y+1)*w ] == old ){
    todo[k] = x<<16 | y+1, k = k+1&m, field[ x + (y+1)*w ] = new;
    }
    }
    }

    return free( todo ), 0;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 22:09:51 2024
    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even on
    Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.

    This comment makes me wonder if I should post my timing results.
    Maybe I will (and including an appropriate disclaimer).

    I do timings over these sizes:

    25 x 19
    19 x 25
    200 x 200
    1280 x 760
    760 x 1280
    1920 x 1080
    1080 x 1920
    3840 x 2160
    2160 x 3840
    4096 x 4096
    38400 x 21600
    21600 x 38400
    32767 x 32767
    32768 x 32768

    with these patterns:

    fractal
    slalom
    rotated slalom
    horizontal snake and vertical snake
    cross in cross
    donut aka ellipse with hole
    entire field starting from center

    If you have other patterns to suggest that would be great,
    I can try to incorporate them (especially if there is
    code to generate the pattern).

    Patterns are allowed to include a nominal start point,
    otherwise I can make an arbitrary choice and assign one.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 21:55:22 2024
    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    It seems that in worst case the strict FIFO algorithm is the same
    as the rest of them, i.e. O(NN) where NN is the number of
    re-colored points. Below is an example of the shape for which I
    measured memory consumption for 3840x2160 image almost exactly 4x
    as much as for 1920x1080.

    I agree, the empirical evidence here and in my own tests is quite
    compelling.

    BTW, I am no longer agree with myself about "the rest of them".
    By now, I know at least one method that is O(W*log(H)). It is even
    quite fast for majority of my test shapes. Unfortunately, [in its
    current form] it is abysmally slow (100x) for minority of tests.
    [In it's current form] it has other disadvantages as well like
    consuming non-trivial amount of memory when handling small spot in the
    big image. But that can be improved. I am less sure that worst-case
    speed can be improved enough to make it generally acceptable.

    I think, I said enough for you to figure out a general principle of
    this algorithm. I don't want to post code here before I try few improvements.

    Thank you for the implied compliment. At this point I think the
    probability that I will figure it out anytime soon is pretty low.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 22:38:59 2024
    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    This topic has been more interesting that I was expecting, and
    also more challenging.

    That's not the first time in my practice where problems with
    simple formulation begots interesting challenges.
    Didn't Donald Knuth wrote 300 or 400 pages about sorting and
    still ended up quite far away from exhausting the topic?

    In my copy of volume 3 of TAOCP, the chapter on sorting takes up
    388 pages. On the other hand, only 108 pages of that deals with
    what we normally think of as sorting algorithms today, and even
    that part is longer than it needs to be because of Knuth's
    exhaustive (and exhausting) writing style. Don Knuth would
    never write a book in the style of The C Programming Language.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 22:43:10 2024
    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    I have a strong rule against writing
    functions more than about 60 lines long. For the problem of
    writing an acceptably quick flood-fill algorithm, I think it would
    at the very least be a lot of work to write code to do that while
    still observing a limit on function length of even 100 lines, let
    alone 60.

    So why not break it down to smaller pieces ?

    The better algorithms I have done are long and also make liberal
    use of goto's. Maybe it isn't impossible to break one or more
    of these algorithms into smaller pieces, but C doesn't make it
    easy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Fri Apr 12 11:13:05 2024
    On Thu, 11 Apr 2024 22:09:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even on
    Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.

    This comment makes me wonder if I should post my timing results.
    Maybe I will (and including an appropriate disclaimer).

    I do timings over these sizes:

    25 x 19
    19 x 25
    200 x 200
    1280 x 760
    760 x 1280
    1920 x 1080
    1080 x 1920
    3840 x 2160
    2160 x 3840
    4096 x 4096
    38400 x 21600
    21600 x 38400
    32767 x 32767
    32768 x 32768


    I didn't went that far up (ended at 4K) and I only test landscape sizes.
    May be, I'd add portrait option to see anisotropic behaviors.
    For bigger sizes, correctness is interesting, speed - not so much, since
    they are unlikely to be edited in interactive manner.

    with these patterns:

    fractal
    slalom
    rotated slalom
    horizontal snake and vertical snake
    cross in cross
    donut aka ellipse with hole
    entire field starting from center

    If you have other patterns to suggest that would be great,
    I can try to incorporate them (especially if there is
    code to generate the pattern).

    Patterns are allowed to include a nominal start point,
    otherwise I can make an arbitrary choice and assign one.

    My suit is about the same with following exceptions:
    1. I didn't add donut yet
    2. + 3 greeds with cell size 2, 3 and 4
    3. + fractal tree
    4. + entire field starting from corner
    It seems, neither of us tests the cases in which linear dimensions of
    the shape are much smaller than those of the field.

    static void make_grid(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c, int cell_sz)
    {
    for (int y = 0; y < height; ++y) {
    unsigned char* p = &image[y*width];
    if (y % cell_sz == 0) {
    memset(p, pen_c, width);
    } else {
    for (int x = 0; x < width; ++x)
    p[x] = x % cell_sz ? background_c : pen_c;
    }
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri Apr 12 11:59:25 2024
    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even
    on Intel Skylake which generally serves as equalizer between
    different algorithms, the speed advantage of 2x2 stack is
    significant.

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sat Apr 13 08:30:03 2024
    Michael S <[email protected]> writes:

    On Thu, 11 Apr 2024 22:09:51 -0700
    Tim Rentsch <[email protected]> wrote:
    [...]

    I do timings over these sizes:

    25 x 19
    19 x 25
    200 x 200
    1280 x 760
    760 x 1280
    1920 x 1080
    1080 x 1920
    3840 x 2160
    2160 x 3840
    4096 x 4096
    38400 x 21600
    21600 x 38400
    32767 x 32767
    32768 x 32768

    I didn't went that far up (ended at 4K)

    I test large sizes for three reasons. One, even if viewable
    area is smaller, virtual displays might be much larger. Two,
    to see how the algorithms scale. Three, larger areas have
    relatively less influence from edge effects.

    Also I have now added

    275 x 25 25 x 275
    400 x 300 300 x 400
    640 x 480 480 x 640
    1600 x 900 900 x 1600
    16000 x 9000 9000 x 16000


    and I only test landscape sizes. May be, I'd add portrait option
    to see anisotropic behaviors.

    I decided to do both, one, for symmetry (and there are still some
    applications for portrait mode), and two, to see whether that has
    an effect on behavior (indeed my latest algorithm is anisotropic,
    so it is good to test the flipped sizes).

    with these patterns:

    fractal
    slalom
    rotated slalom
    horizontal snake and vertical snake
    cross in cross
    donut aka ellipse with hole
    entire field starting from center

    If you have other patterns to suggest that would be great,
    I can try to incorporate them (especially if there is
    code to generate the pattern).

    Patterns are allowed to include a nominal start point,
    otherwise I can make an arbitrary choice and assign one.

    My suit is about the same with following exceptions:
    1. I didn't add donut yet
    2. + 3 greeds with cell size 2, 3 and 4
    3. + fractal tree

    By "fractal" I meant fractal tree. Sorry if that was confusing.

    4. + entire field starting from corner

    I used to do that but took it out as redundant. I've added
    it back now. :)

    It seems, neither of us tests the cases in which linear dimensions
    of the shape are much smaller than those of the field.

    Shouldn't make a difference (for any of the algorithms shown) as
    long as there is at least a 1 pixel border around the pattern.
    Maybe I will add that variation (ick, a lot of work). By the
    way the donut pattern already has a 1 pixel border, ie, does
    not touch any edge.

    static void make_grid(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c, int cell_sz)
    {
    for (int y = 0; y < height; ++y) {
    unsigned char* p = &image[y*width];
    if (y % cell_sz == 0) {
    memset(p, pen_c, width);
    } else {
    for (int x = 0; x < width; ++x)
    p[x] = x % cell_sz ? background_c : pen_c;
    }
    }
    }

    Ahh, this is what you meant by greed. A nice set of patterns.
    I wrote a variation where the "line width" as well as the
    "hole width" is variable, and added a bunch of those to my
    tests (so a full timing suite now runs for several hours).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sat Apr 13 20:26:39 2024
    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even
    on Intel Skylake which generally serves as equalizer between
    different algorithms, the speed advantage of 2x2 stack is
    significant.

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    This particular server turned off right now.
    Hopefully, next Monday I would be able to test on it.
    It would help if in the mean time you point me to specific post with
    code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sat Apr 13 10:54:46 2024
    Michael S <[email protected]> writes:

    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <[email protected]> wrote:

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    This particular server turned off right now.
    Hopefully, next Monday I would be able to test on it.
    It would help if in the mean time you point me to specific post
    with code.

    Does this help? Message-ID: <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sat Apr 13 23:11:59 2024
    On Sat, 13 Apr 2024 10:54:46 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <[email protected]> wrote:

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    This particular server turned off right now.
    Hopefully, next Monday I would be able to test on it.
    It would help if in the mean time you point me to specific post
    with code.

    Does this help? Message-ID: <[email protected]>

    Yes, it is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Wed Apr 17 00:47:22 2024
    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even
    on Intel Skylake which generally serves as equalizer between
    different algorithms, the speed advantage of 2x2 stack is
    significant.

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    Finally found the time for speed measurements.

    I tested four algorithms:
    1. stack_2x2 - stack-like processing where each element is 2x2 rectangle
    with Lot's wife amendment.
    2. stack_timr1 - first variant of stack by Tim Rentsch
    3. stack_timr2 - second variant of stack by Tim Rentsch
    4. queue_timr - "take no prisoners" queue by Tim Rentsch, the one with power-of-two circular buffer, (x,y) packed to 32 bits and inner loop
    optimized for solid shapes.

    Tests were run on four CPUs
    1. IVB - Intel Core i7-3570 at 3700 MHz. As far as CPUs are going,
    rather old thing.
    2. HSW - Intel Xeon E3-1271 v3 at 4000 MHz. Only couple of years
    younger than above.
    3. SKC - Intel Xeon E-2176G at 4250 MHz. Significantly younger, but microarchitecture exists since 2015.
    4. ZN3 - AMD EPYC 7543P at 3700 MHz. The only one on my roaster whose microarchitecture can be considered relatively modern.

    As you can see, with exception of the oldest CPU, your 2d stack variant
    is not an improvement over the first.

    What surprised me after I put all results together, was a poor showing
    of SKC. I can't remember any other of my microbenchmarks (and I do
    plenty) were this CPU was so decisively beaten by its older cousin.

    The columns are as following:
    1. Shape name
    2. Starting point (x,y)
    3. Number of points to recolor
    4. total test duration, seconds
    5. time per pixel - normalized to image area, nsec
    6. time per pixel - normalized to number of points to recolor, nsec


    IVB,stack_2x2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.547 2.73 2.73
    Solid square ( 0, 0) 475 0.537 2.68 2.68
    standing snake-like shape ( 0, 0) 259 0.522 2.61 4.79
    prostrate snake-like shape ( 0, 0) 259 0.528 2.64 4.84
    slalom shape ( 0, 0) 233 0.459 2.29 4.68
    slalom shape(rotated) ( 0, 0) 223 0.455 2.27 4.85
    cross-in-cross ( 0, 0) 403 0.515 2.57 3.04
    fractal tree ( 12, 0) 247 0.469 2.34 4.51
    greed(2) ( 0, 0) 367 0.558 2.79 3.61
    greed(3) ( 0, 0) 283 0.463 2.31 3.89
    greed(4) ( 0, 0) 223 0.399 1.99 4.25
    donut ( 23, 9) 238 0.305 1.52 3.04
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.461 2.30 2.30
    Solid square ( 0, 0) 40000 0.460 2.30 2.30
    standing snake-like shape ( 0, 0) 20100 0.382 1.91 3.80
    prostrate snake-like shape ( 0, 0) 20100 0.474 2.37 4.71
    slalom shape ( 0, 0) 19802 0.435 2.17 4.39
    slalom shape(rotated) ( 0, 0) 19802 0.450 2.25 4.54
    cross-in-cross ( 0, 0) 39216 0.470 2.35 2.40
    fractal tree ( 99, 0) 18674 0.432 2.16 4.62
    greed(2) ( 0, 0) 30000 0.458 2.29 3.05
    greed(3) ( 0, 0) 22311 0.413 2.06 3.70
    greed(4) ( 0, 0) 17500 0.348 1.74 3.98
    donut ( 199, 100) 25830 0.315 1.57 2.44
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.450 2.24 2.24
    Solid square ( 0, 0) 921600 0.450 2.24 2.24
    standing snake-like shape ( 0, 0) 461160 0.371 1.85 3.69
    prostrate snake-like shape ( 0, 0) 461440 0.469 2.33 4.66
    slalom shape ( 0, 0) 460082 0.437 2.18 4.36
    slalom shape(rotated) ( 0, 0) 460800 0.448 2.23 4.46
    cross-in-cross ( 0, 0) 917616 0.452 2.25 2.26
    fractal tree ( 639, 0) 445860 0.460 2.29 4.73
    greed(2) ( 0, 0) 691200 0.468 2.33 3.11
    greed(3) ( 0, 0) 512160 0.406 2.02 3.64
    greed(4) ( 0, 0) 403200 0.344 1.71 3.91
    donut (1279, 360) 655856 0.326 1.62 2.28
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.453 2.23 2.23
    Solid square ( 0, 0) 2073600 0.457 2.25 2.25
    standing snake-like shape ( 0, 0) 1037340 0.374 1.84 3.68
    prostrate snake-like shape ( 0, 0) 1037760 0.474 2.33 4.66
    slalom shape ( 0, 0) 1036800 0.443 2.18 4.36
    slalom shape(rotated) ( 0, 0) 1036800 0.452 2.22 4.45
    cross-in-cross ( 0, 0) 2067616 0.457 2.25 2.26
    fractal tree ( 959, 0) 1034612 0.453 2.23 4.47
    greed(2) ( 0, 0) 1555200 0.450 2.21 2.95
    greed(3) ( 0, 0) 1152000 0.407 2.00 3.61
    greed(4) ( 0, 0) 907200 0.346 1.70 3.89
    donut (1919, 540) 1477788 0.326 1.60 2.25
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.500 2.32 2.32
    Solid square ( 0, 0) 8294400 0.539 2.50 2.50
    standing snake-like shape ( 0, 0) 4148280 0.449 2.08 4.16
    prostrate snake-like shape ( 0, 0) 4149120 0.746 3.46 6.92
    slalom shape ( 0, 0) 4147200 0.703 3.26 6.52
    slalom shape(rotated) ( 0, 0) 4147200 0.537 2.49 4.98
    cross-in-cross ( 0, 0) 8282416 0.518 2.40 2.41
    fractal tree (1919, 0) 4135652 0.514 2.38 4.78
    greed(2) ( 0, 0) 6220800 0.533 2.47 3.30
    greed(3) ( 0, 0) 4608000 0.468 2.17 3.91
    greed(4) ( 0, 0) 3628800 0.386 1.79 4.09
    donut (3839,1080) 5919706 0.356 1.65 2.31

    IVB,stack_timr1:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 1.132 5.66 5.66
    Solid square ( 0, 0) 475 1.171 5.85 5.85
    standing snake-like shape ( 0, 0) 259 0.724 3.62 6.64
    prostrate snake-like shape ( 0, 0) 259 0.712 3.56 6.53
    slalom shape ( 0, 0) 233 0.632 3.16 6.44
    slalom shape(rotated) ( 0, 0) 223 0.632 3.16 6.73
    cross-in-cross ( 0, 0) 403 0.931 4.65 5.49
    fractal tree ( 12, 0) 247 0.537 2.68 5.16
    greed(2) ( 0, 0) 367 0.866 4.33 5.60
    greed(3) ( 0, 0) 283 0.724 3.62 6.08
    greed(4) ( 0, 0) 223 0.618 3.09 6.58
    donut ( 23, 9) 238 0.632 3.16 6.31
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.764 3.82 3.82
    Solid square ( 0, 0) 40000 0.759 3.79 3.79
    standing snake-like shape ( 0, 0) 20100 0.389 1.94 3.87
    prostrate snake-like shape ( 0, 0) 20100 0.400 2.00 3.98
    slalom shape ( 0, 0) 19802 0.388 1.94 3.92
    slalom shape(rotated) ( 0, 0) 19802 0.388 1.94 3.92
    cross-in-cross ( 0, 0) 39216 0.763 3.81 3.89
    fractal tree ( 99, 0) 18674 0.372 1.86 3.98
    greed(2) ( 0, 0) 30000 0.591 2.95 3.94
    greed(3) ( 0, 0) 22311 0.445 2.22 3.99
    greed(4) ( 0, 0) 17500 0.345 1.72 3.94
    donut ( 199, 100) 25830 0.517 2.58 4.00
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.793 3.95 3.95
    Solid square ( 0, 0) 921600 0.868 4.32 4.32
    standing snake-like shape ( 0, 0) 461160 0.420 2.09 4.18
    prostrate snake-like shape ( 0, 0) 461440 0.441 2.20 4.38
    slalom shape ( 0, 0) 460082 0.434 2.16 4.33
    slalom shape(rotated) ( 0, 0) 460800 0.429 2.14 4.27
    cross-in-cross ( 0, 0) 917616 0.801 3.99 4.00
    fractal tree ( 639, 0) 445860 0.389 1.94 4.00
    greed(2) ( 0, 0) 691200 0.614 3.06 4.07
    greed(3) ( 0, 0) 512160 0.424 2.11 3.80
    greed(4) ( 0, 0) 403200 0.334 1.66 3.80
    donut (1279, 360) 655856 0.572 2.85 4.00
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.793 3.90 3.90
    Solid square ( 0, 0) 2073600 0.909 4.47 4.47
    standing snake-like shape ( 0, 0) 1037340 0.415 2.04 4.08
    prostrate snake-like shape ( 0, 0) 1037760 0.442 2.18 4.35
    slalom shape ( 0, 0) 1036800 0.431 2.12 4.24
    slalom shape(rotated) ( 0, 0) 1036800 0.425 2.09 4.18
    cross-in-cross ( 0, 0) 2067616 0.843 4.15 4.16
    fractal tree ( 959, 0) 1034612 0.395 1.94 3.90
    greed(2) ( 0, 0) 1555200 0.614 3.02 4.03
    greed(3) ( 0, 0) 1152000 0.430 2.12 3.81
    greed(4) ( 0, 0) 907200 0.341 1.68 3.84
    donut (1919, 540) 1477788 0.571 2.81 3.94
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.923 4.28 4.28
    Solid square ( 0, 0) 8294400 1.109 5.14 5.14
    standing snake-like shape ( 0, 0) 4148280 0.521 2.42 4.83
    prostrate snake-like shape ( 0, 0) 4149120 1.186 5.50 10.99
    slalom shape ( 0, 0) 4147200 0.938 4.35 8.70
    slalom shape(rotated) ( 0, 0) 4147200 0.545 2.53 5.05
    cross-in-cross ( 0, 0) 8282416 0.999 4.63 4.64
    fractal tree (1919, 0) 4135652 0.433 2.01 4.03
    greed(2) ( 0, 0) 6220800 0.738 3.42 4.56
    greed(3) ( 0, 0) 4608000 0.529 2.45 4.42
    greed(4) ( 0, 0) 3628800 0.417 1.93 4.42
    donut (3839,1080) 5919706 0.666 3.09 4.33


    IVB,stack_timr2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.963 4.81 4.81
    Solid square ( 0, 0) 475 0.990 4.95 4.95
    standing snake-like shape ( 0, 0) 259 0.615 3.07 5.64
    prostrate snake-like shape ( 0, 0) 259 0.673 3.36 6.17
    slalom shape ( 0, 0) 233 0.761 3.80 7.76
    slalom shape(rotated) ( 0, 0) 223 0.815 4.07 8.68
    cross-in-cross ( 0, 0) 403 1.160 5.80 6.84
    fractal tree ( 12, 0) 247 0.740 3.70 7.12
    greed(2) ( 0, 0) 367 1.093 5.46 7.07
    greed(3) ( 0, 0) 283 0.762 3.81 6.39
    greed(4) ( 0, 0) 223 0.621 3.10 6.61
    donut ( 23, 9) 238 0.753 3.76 7.51
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.587 2.93 2.93
    Solid square ( 0, 0) 40000 0.588 2.94 2.94
    standing snake-like shape ( 0, 0) 20100 0.299 1.49 2.97
    prostrate snake-like shape ( 0, 0) 20100 0.311 1.55 3.09
    slalom shape ( 0, 0) 19802 0.481 2.40 4.86
    slalom shape(rotated) ( 0, 0) 19802 0.586 2.93 5.92
    cross-in-cross ( 0, 0) 39216 0.609 3.04 3.10
    fractal tree ( 99, 0) 18674 0.539 2.69 5.77
    greed(2) ( 0, 0) 30000 0.909 4.54 6.06
    greed(3) ( 0, 0) 22311 0.468 2.34 4.19
    greed(4) ( 0, 0) 17500 0.371 1.85 4.24
    donut ( 199, 100) 25830 0.418 2.09 3.24
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.602 3.00 3.00
    Solid square ( 0, 0) 921600 0.741 3.69 3.69
    standing snake-like shape ( 0, 0) 461160 0.326 1.62 3.24
    prostrate snake-like shape ( 0, 0) 461440 0.342 1.70 3.40
    slalom shape ( 0, 0) 460082 0.519 2.58 5.17
    slalom shape(rotated) ( 0, 0) 460800 0.626 3.12 6.23
    cross-in-cross ( 0, 0) 917616 0.666 3.31 3.33
    fractal tree ( 639, 0) 445860 0.565 2.81 5.81
    greed(2) ( 0, 0) 691200 0.938 4.67 6.23
    greed(3) ( 0, 0) 512160 0.491 2.44 4.40
    greed(4) ( 0, 0) 403200 0.352 1.75 4.00
    donut (1279, 360) 655856 0.450 2.24 3.15
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.611 3.01 3.01
    Solid square ( 0, 0) 2073600 0.759 3.74 3.74
    standing snake-like shape ( 0, 0) 1037340 0.330 1.62 3.25
    prostrate snake-like shape ( 0, 0) 1037760 0.350 1.72 3.44
    slalom shape ( 0, 0) 1036800 0.525 2.58 5.17
    slalom shape(rotated) ( 0, 0) 1036800 0.636 3.13 6.26
    cross-in-cross ( 0, 0) 2067616 0.674 3.32 3.33
    fractal tree ( 959, 0) 1034612 0.605 2.98 5.97
    greed(2) ( 0, 0) 1555200 0.923 4.54 6.06
    greed(3) ( 0, 0) 1152000 0.463 2.28 4.10
    greed(4) ( 0, 0) 907200 0.359 1.77 4.04
    donut (1919, 540) 1477788 0.431 2.12 2.98
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.703 3.26 3.26
    Solid square ( 0, 0) 8294400 0.847 3.93 3.93
    standing snake-like shape ( 0, 0) 4148280 0.400 1.85 3.71
    prostrate snake-like shape ( 0, 0) 4149120 0.815 3.78 7.55
    slalom shape ( 0, 0) 4147200 0.871 4.04 8.08
    slalom shape(rotated) ( 0, 0) 4147200 0.734 3.40 6.81
    cross-in-cross ( 0, 0) 8282416 0.774 3.59 3.59
    fractal tree (1919, 0) 4135652 0.658 3.05 6.12
    greed(2) ( 0, 0) 6220800 1.023 4.74 6.32
    greed(3) ( 0, 0) 4608000 0.554 2.57 4.62
    greed(4) ( 0, 0) 3628800 0.451 2.09 4.78
    donut (3839,1080) 5919706 0.498 2.31 3.24

    IVB,queue_timr:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.828 4.14 4.14
    Solid square ( 0, 0) 475 0.890 4.45 4.45
    standing snake-like shape ( 0, 0) 259 0.642 3.21 5.89
    prostrate snake-like shape ( 0, 0) 259 0.709 3.54 6.50
    slalom shape ( 0, 0) 233 0.589 2.94 6.00
    slalom shape(rotated) ( 0, 0) 223 0.573 2.86 6.10
    cross-in-cross ( 0, 0) 403 0.713 3.56 4.20
    fractal tree ( 12, 0) 247 0.448 2.24 4.31
    greed(2) ( 0, 0) 367 0.675 3.37 4.37
    greed(3) ( 0, 0) 283 0.512 2.56 4.30
    greed(4) ( 0, 0) 223 0.409 2.04 4.36
    donut ( 23, 9) 238 0.439 2.19 4.38

    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.893 4.46 4.46
    Solid square ( 0, 0) 40000 0.786 3.93 3.93
    standing snake-like shape ( 0, 0) 20100 0.555 2.77 5.52
    prostrate snake-like shape ( 0, 0) 20100 0.557 2.78 5.54
    slalom shape ( 0, 0) 19802 0.571 2.85 5.76
    slalom shape(rotated) ( 0, 0) 19802 0.548 2.74 5.53
    cross-in-cross ( 0, 0) 39216 0.736 3.68 3.75
    fractal tree ( 99, 0) 18674 0.569 2.84 6.09
    greed(2) ( 0, 0) 30000 0.615 3.07 4.10
    greed(3) ( 0, 0) 22311 0.453 2.26 4.06
    greed(4) ( 0, 0) 17500 0.357 1.78 4.08
    donut ( 199, 100) 25830 0.531 2.65 4.11

    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.785 3.91 3.91
    Solid square ( 0, 0) 921600 0.761 3.79 3.79
    standing snake-like shape ( 0, 0) 461160 0.551 2.74 5.48
    prostrate snake-like shape ( 0, 0) 461440 0.552 2.75 5.49
    slalom shape ( 0, 0) 460082 0.564 2.81 5.62
    slalom shape(rotated) ( 0, 0) 460800 0.557 2.77 5.54
    cross-in-cross ( 0, 0) 917616 0.755 3.76 3.77
    fractal tree ( 639, 0) 445860 0.448 2.23 4.61
    greed(2) ( 0, 0) 691200 0.645 3.21 4.28
    greed(3) ( 0, 0) 512160 0.481 2.39 4.31
    greed(4) ( 0, 0) 403200 0.377 1.88 4.29
    donut (1279, 360) 655856 0.572 2.85 4.00

    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.854 4.20 4.20
    Solid square ( 0, 0) 2073600 0.829 4.08 4.08
    standing snake-like shape ( 0, 0) 1037340 0.557 2.74 5.48
    prostrate snake-like shape ( 0, 0) 1037760 0.574 2.82 5.64
    slalom shape ( 0, 0) 1036800 0.583 2.87 5.74
    slalom shape(rotated) ( 0, 0) 1036800 0.563 2.77 5.54
    cross-in-cross ( 0, 0) 2067616 0.822 4.05 4.06
    fractal tree ( 959, 0) 1034612 0.468 2.30 4.62
    greed(2) ( 0, 0) 1555200 0.664 3.27 4.36
    greed(3) ( 0, 0) 1152000 0.483 2.38 4.28
    greed(4) ( 0, 0) 907200 0.389 1.91 4.38
    donut (1919, 540) 1477788 0.617 3.04 4.26

    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 1.407 6.52 6.52
    Solid square ( 0, 0) 8294400 1.555 7.21 7.21
    standing snake-like shape ( 0, 0) 4148280 0.596 2.76 5.53
    prostrate snake-like shape ( 0, 0) 4149120 0.851 3.95 7.89
    slalom shape ( 0, 0) 4147200 0.802 3.72 7.44
    slalom shape(rotated) ( 0, 0) 4147200 0.600 2.78 5.56
    cross-in-cross ( 0, 0) 8282416 1.522 7.06 7.07
    fractal tree (1919, 0) 4135652 1.151 5.34 10.70
    greed(2) ( 0, 0) 6220800 1.410 6.54 8.72
    greed(3) ( 0, 0) 4608000 1.450 6.72 12.10
    greed(4) ( 0, 0) 3628800 1.432 6.64 15.18
    donut (3839,1080) 5919706 1.114 5.17 7.24

    HSW,stack_2x2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.391 1.95 1.95
    Solid square ( 0, 0) 475 0.402 2.01 2.01
    standing snake-like shape ( 0, 0) 259 0.389 1.94 3.57
    prostrate snake-like shape ( 0, 0) 259 0.351 1.75 3.22
    slalom shape ( 0, 0) 233 0.360 1.80 3.67
    slalom shape(rotated) ( 0, 0) 223 0.366 1.83 3.90
    cross-in-cross ( 0, 0) 403 0.385 1.92 2.27
    fractal tree ( 12, 0) 247 0.382 1.91 3.67
    greed(2) ( 0, 0) 367 0.416 2.08 2.69
    greed(3) ( 0, 0) 283 0.361 1.80 3.03
    greed(4) ( 0, 0) 223 0.306 1.53 3.26
    donut ( 23, 9) 238 0.225 1.12 2.25
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.344 1.72 1.72
    Solid square ( 0, 0) 40000 0.343 1.71 1.71
    standing snake-like shape ( 0, 0) 20100 0.274 1.37 2.73
    prostrate snake-like shape ( 0, 0) 20100 0.311 1.55 3.09
    slalom shape ( 0, 0) 19802 0.333 1.66 3.36
    slalom shape(rotated) ( 0, 0) 19802 0.344 1.72 3.47
    cross-in-cross ( 0, 0) 39216 0.352 1.76 1.79
    fractal tree ( 99, 0) 18674 0.497 2.48 5.32
    greed(2) ( 0, 0) 30000 0.338 1.69 2.25
    greed(3) ( 0, 0) 22311 0.317 1.58 2.84
    greed(4) ( 0, 0) 17500 0.247 1.23 2.82
    donut ( 199, 100) 25830 0.237 1.18 1.83
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.334 1.66 1.66
    Solid square ( 0, 0) 921600 0.332 1.65 1.65
    standing snake-like shape ( 0, 0) 461160 0.263 1.31 2.62
    prostrate snake-like shape ( 0, 0) 461440 0.342 1.70 3.40
    slalom shape ( 0, 0) 460082 0.352 1.75 3.51
    slalom shape(rotated) ( 0, 0) 460800 0.346 1.72 3.44
    cross-in-cross ( 0, 0) 917616 0.336 1.67 1.68
    fractal tree ( 639, 0) 445860 0.437 2.18 4.50
    greed(2) ( 0, 0) 691200 0.326 1.62 2.16
    greed(3) ( 0, 0) 512160 0.303 1.51 2.71
    greed(4) ( 0, 0) 403200 0.245 1.22 2.79
    donut (1279, 360) 655856 0.243 1.21 1.70
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.337 1.66 1.66
    Solid square ( 0, 0) 2073600 0.335 1.65 1.65
    standing snake-like shape ( 0, 0) 1037340 0.265 1.30 2.61
    prostrate snake-like shape ( 0, 0) 1037760 0.333 1.64 3.27
    slalom shape ( 0, 0) 1036800 0.344 1.69 3.39
    slalom shape(rotated) ( 0, 0) 1036800 0.342 1.68 3.37
    cross-in-cross ( 0, 0) 2067616 0.338 1.66 1.67
    fractal tree ( 959, 0) 1034612 0.472 2.32 4.66
    greed(2) ( 0, 0) 1555200 0.328 1.61 2.15
    greed(3) ( 0, 0) 1152000 0.305 1.50 2.70
    greed(4) ( 0, 0) 907200 0.245 1.21 2.76
    donut (1919, 540) 1477788 0.244 1.20 1.68
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.375 1.74 1.74
    Solid square ( 0, 0) 8294400 0.402 1.86 1.86
    standing snake-like shape ( 0, 0) 4148280 0.323 1.50 2.99
    prostrate snake-like shape ( 0, 0) 4149120 0.561 2.60 5.20
    slalom shape ( 0, 0) 4147200 0.574 2.66 5.32
    slalom shape(rotated) ( 0, 0) 4147200 0.407 1.89 3.77
    cross-in-cross ( 0, 0) 8282416 0.384 1.78 1.78
    fractal tree (1919, 0) 4135652 0.508 2.36 4.72
    greed(2) ( 0, 0) 6220800 0.395 1.83 2.44
    greed(3) ( 0, 0) 4608000 0.350 1.62 2.92
    greed(4) ( 0, 0) 3628800 0.275 1.28 2.91
    donut (3839,1080) 5919706 0.262 1.21 1.70

    HSW,stack_timr1:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.801 4.00 4.00
    Solid square ( 0, 0) 475 0.845 4.22 4.22
    standing snake-like shape ( 0, 0) 259 0.511 2.55 4.69
    prostrate snake-like shape ( 0, 0) 259 0.516 2.58 4.73
    slalom shape ( 0, 0) 233 0.520 2.60 5.30
    slalom shape(rotated) ( 0, 0) 223 0.476 2.38 5.07
    cross-in-cross ( 0, 0) 403 0.694 3.47 4.09
    fractal tree ( 12, 0) 247 0.414 2.07 3.98
    greed(2) ( 0, 0) 367 0.645 3.22 4.17
    greed(3) ( 0, 0) 283 0.552 2.76 4.63
    greed(4) ( 0, 0) 223 0.476 2.38 5.07
    donut ( 23, 9) 238 0.469 2.34 4.68
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.447 2.23 2.23
    Solid square ( 0, 0) 40000 0.444 2.22 2.22
    standing snake-like shape ( 0, 0) 20100 0.229 1.14 2.28
    prostrate snake-like shape ( 0, 0) 20100 0.250 1.25 2.49
    slalom shape ( 0, 0) 19802 0.270 1.35 2.73
    slalom shape(rotated) ( 0, 0) 19802 0.260 1.30 2.62
    cross-in-cross ( 0, 0) 39216 0.459 2.29 2.34
    fractal tree ( 99, 0) 18674 0.260 1.30 2.78
    greed(2) ( 0, 0) 30000 0.387 1.93 2.58
    greed(3) ( 0, 0) 22311 0.295 1.47 2.64
    greed(4) ( 0, 0) 17500 0.231 1.15 2.64
    donut ( 199, 100) 25830 0.316 1.58 2.45
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.457 2.27 2.27
    Solid square ( 0, 0) 921600 0.515 2.56 2.56
    standing snake-like shape ( 0, 0) 461160 0.248 1.23 2.47
    prostrate snake-like shape ( 0, 0) 461440 0.321 1.60 3.19
    slalom shape ( 0, 0) 460082 0.312 1.55 3.11
    slalom shape(rotated) ( 0, 0) 460800 0.285 1.42 2.84
    cross-in-cross ( 0, 0) 917616 0.466 2.32 2.33
    fractal tree ( 639, 0) 445860 0.271 1.35 2.79
    greed(2) ( 0, 0) 691200 0.406 2.02 2.69
    greed(3) ( 0, 0) 512160 0.278 1.38 2.49
    greed(4) ( 0, 0) 403200 0.223 1.11 2.54
    donut (1279, 360) 655856 0.335 1.67 2.34
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.454 2.23 2.23
    Solid square ( 0, 0) 2073600 0.554 2.73 2.73
    standing snake-like shape ( 0, 0) 1037340 0.240 1.18 2.36
    prostrate snake-like shape ( 0, 0) 1037760 0.317 1.56 3.12
    slalom shape ( 0, 0) 1036800 0.306 1.51 3.01
    slalom shape(rotated) ( 0, 0) 1036800 0.293 1.44 2.88
    cross-in-cross ( 0, 0) 2067616 0.511 2.51 2.52
    fractal tree ( 959, 0) 1034612 0.276 1.36 2.72
    greed(2) ( 0, 0) 1555200 0.402 1.98 2.64
    greed(3) ( 0, 0) 1152000 0.283 1.39 2.51
    greed(4) ( 0, 0) 907200 0.224 1.10 2.52
    donut (1919, 540) 1477788 0.331 1.63 2.29
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.566 2.62 2.62
    Solid square ( 0, 0) 8294400 0.708 3.28 3.28
    standing snake-like shape ( 0, 0) 4148280 0.337 1.56 3.12
    prostrate snake-like shape ( 0, 0) 4149120 0.930 4.31 8.62
    slalom shape ( 0, 0) 4147200 0.735 3.41 6.82
    slalom shape(rotated) ( 0, 0) 4147200 0.387 1.79 3.59
    cross-in-cross ( 0, 0) 8282416 0.630 2.92 2.93
    fractal tree (1919, 0) 4135652 0.302 1.40 2.81
    greed(2) ( 0, 0) 6220800 0.516 2.39 3.19
    greed(3) ( 0, 0) 4608000 0.367 1.70 3.06
    greed(4) ( 0, 0) 3628800 0.286 1.33 3.03
    donut (3839,1080) 5919706 0.398 1.85 2.59

    HSW,stack_timr2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.746 3.73 3.73
    Solid square ( 0, 0) 475 0.781 3.90 3.90
    standing snake-like shape ( 0, 0) 259 0.479 2.39 4.39
    prostrate snake-like shape ( 0, 0) 259 0.525 2.62 4.81
    slalom shape ( 0, 0) 233 0.546 2.73 5.57
    slalom shape(rotated) ( 0, 0) 223 0.532 2.66 5.67
    cross-in-cross ( 0, 0) 403 0.804 4.02 4.74
    fractal tree ( 12, 0) 247 0.466 2.33 4.48
    greed(2) ( 0, 0) 367 0.668 3.34 4.32
    greed(3) ( 0, 0) 283 0.552 2.76 4.63
    greed(4) ( 0, 0) 223 0.446 2.23 4.75
    donut ( 23, 9) 238 0.521 2.60 5.20
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.432 2.16 2.16
    Solid square ( 0, 0) 40000 0.430 2.15 2.15
    standing snake-like shape ( 0, 0) 20100 0.223 1.11 2.22
    prostrate snake-like shape ( 0, 0) 20100 0.271 1.35 2.70
    slalom shape ( 0, 0) 19802 0.360 1.80 3.63
    slalom shape(rotated) ( 0, 0) 19802 0.385 1.92 3.89
    cross-in-cross ( 0, 0) 39216 0.468 2.34 2.39
    fractal tree ( 99, 0) 18674 0.457 2.28 4.89
    greed(2) ( 0, 0) 30000 0.468 2.34 3.12
    greed(3) ( 0, 0) 22311 0.353 1.76 3.16
    greed(4) ( 0, 0) 17500 0.275 1.37 3.14
    donut ( 199, 100) 25830 0.340 1.70 2.63
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.450 2.24 2.24
    Solid square ( 0, 0) 921600 0.560 2.79 2.79
    standing snake-like shape ( 0, 0) 461160 0.244 1.21 2.43
    prostrate snake-like shape ( 0, 0) 461440 0.282 1.40 2.80
    slalom shape ( 0, 0) 460082 0.412 2.05 4.11
    slalom shape(rotated) ( 0, 0) 460800 0.414 2.06 4.12
    cross-in-cross ( 0, 0) 917616 0.497 2.47 2.48
    fractal tree ( 639, 0) 445860 0.426 2.12 4.38
    greed(2) ( 0, 0) 691200 0.496 2.47 3.29
    greed(3) ( 0, 0) 512160 0.373 1.86 3.34
    greed(4) ( 0, 0) 403200 0.282 1.40 3.21
    donut (1279, 360) 655856 0.312 1.55 2.18
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.437 2.15 2.15
    Solid square ( 0, 0) 2073600 0.559 2.75 2.75
    standing snake-like shape ( 0, 0) 1037340 0.235 1.16 2.31
    prostrate snake-like shape ( 0, 0) 1037760 0.274 1.35 2.69
    slalom shape ( 0, 0) 1036800 0.396 1.95 3.90
    slalom shape(rotated) ( 0, 0) 1036800 0.407 2.00 4.01
    cross-in-cross ( 0, 0) 2067616 0.490 2.41 2.42
    fractal tree ( 959, 0) 1034612 0.457 2.25 4.51
    greed(2) ( 0, 0) 1555200 0.486 2.39 3.19
    greed(3) ( 0, 0) 1152000 0.346 1.70 3.06
    greed(4) ( 0, 0) 907200 0.268 1.32 3.01
    donut (1919, 540) 1477788 0.297 1.46 2.05
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.523 2.43 2.43
    Solid square ( 0, 0) 8294400 0.649 3.01 3.01
    standing snake-like shape ( 0, 0) 4148280 0.307 1.42 2.85
    prostrate snake-like shape ( 0, 0) 4149120 0.614 2.85 5.69
    slalom shape ( 0, 0) 4147200 0.666 3.09 6.18
    slalom shape(rotated) ( 0, 0) 4147200 0.495 2.30 4.59
    cross-in-cross ( 0, 0) 8282416 0.585 2.71 2.72
    fractal tree (1919, 0) 4135652 0.435 2.02 4.05
    greed(2) ( 0, 0) 6220800 0.583 2.70 3.60
    greed(3) ( 0, 0) 4608000 0.426 1.98 3.56
    greed(4) ( 0, 0) 3628800 0.342 1.59 3.62
    donut (3839,1080) 5919706 0.369 1.71 2.40

    HSW,queue_timr:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.698 3.49 3.49
    Solid square ( 0, 0) 475 0.709 3.54 3.54
    standing snake-like shape ( 0, 0) 259 0.517 2.58 4.74
    prostrate snake-like shape ( 0, 0) 259 0.518 2.59 4.75
    slalom shape ( 0, 0) 233 0.478 2.39 4.87
    slalom shape(rotated) ( 0, 0) 223 0.447 2.23 4.76
    cross-in-cross ( 0, 0) 403 0.577 2.88 3.40
    fractal tree ( 12, 0) 247 0.374 1.87 3.60
    greed(2) ( 0, 0) 367 0.515 2.57 3.33
    greed(3) ( 0, 0) 283 0.409 2.04 3.43
    greed(4) ( 0, 0) 223 0.336 1.68 3.58
    donut ( 23, 9) 238 0.379 1.89 3.78
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.662 3.31 3.31
    Solid square ( 0, 0) 40000 0.619 3.09 3.09
    standing snake-like shape ( 0, 0) 20100 0.443 2.21 4.41
    prostrate snake-like shape ( 0, 0) 20100 0.446 2.23 4.44
    slalom shape ( 0, 0) 19802 0.440 2.20 4.44
    slalom shape(rotated) ( 0, 0) 19802 0.439 2.19 4.43
    cross-in-cross ( 0, 0) 39216 0.629 3.14 3.21
    fractal tree ( 99, 0) 18674 0.618 3.09 6.62
    greed(2) ( 0, 0) 30000 0.477 2.38 3.18
    greed(3) ( 0, 0) 22311 0.364 1.82 3.26
    greed(4) ( 0, 0) 17500 0.289 1.44 3.30
    donut ( 199, 100) 25830 0.482 2.41 3.73
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.669 3.33 3.33
    Solid square ( 0, 0) 921600 0.628 3.13 3.13
    standing snake-like shape ( 0, 0) 461160 0.444 2.21 4.42
    prostrate snake-like shape ( 0, 0) 461440 0.445 2.21 4.42
    slalom shape ( 0, 0) 460082 0.444 2.21 4.43

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Apr 17 10:47:25 2024
    Michael S <[email protected]> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Wed Apr 17 22:41:26 2024
    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.



    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your test
    hardware.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    I am still interested in combination of speed that does not suck
    with O(N) worst-case memory footprint.
    I already have couple of variants of the former, but so far they are
    all unreasonably slow - ~5 times slower than the best.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri Apr 19 14:59:20 2024
    Michael S <[email protected]> writes:

    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your
    test hardware.

    I think trying to look at those wouldn't tell me anything
    helpful. Too many unknowns. And still no way to test or
    measure any changes to the various algorithms.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    I am still interested in combination of speed that does
    not suck with O(N) worst-case memory footprint.
    I already have couple of variants of the former,

    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sat Apr 20 21:10:23 2024
    On Fri, 19 Apr 2024 14:59:20 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your
    test hardware.

    I think trying to look at those wouldn't tell me anything
    helpful. Too many unknowns. And still no way to test or
    measure any changes to the various algorithms.


    Frankly, I don't understand.
    If you have troubles with testing on shared hardware then you can
    always test on the hardware that you own and has full control.
    Even if it is a little old, the trends tend to be the same. At least I
    clearly see the same trends on my almost 12 y.o. home PC and on
    relatively modern EPYC3.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    I am still interested in combination of speed that does
    not suck with O(N) worst-case memory footprint.
    I already have couple of variants of the former,

    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.


    No, I mean O(max(M,N)) plus possibly some logarithmic component that
    loses significance when images grow bigger.
    More so, if bounding rectangle of the shape is A x B then I'd like
    memory requirements to be O(max(A,B)), but so far it does not appear to
    be possible, or at least not possible without significant complications
    and further slowdown. So, as an intermediate goal I am willing to
    accept that allocation would be O(max(M,N)). but amount of touched
    memory is O(max(A,B)).

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Thu Apr 25 17:56:06 2024
    On Sat, 20 Apr 2024 21:10:23 +0300
    Michael S <[email protected]> wrote:

    On Fri, 19 Apr 2024 14:59:20 -0700
    Tim Rentsch <[email protected]> wrote:


    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.


    No, I mean O(max(M,N)) plus possibly some logarithmic component that
    loses significance when images grow bigger.
    More so, if bounding rectangle of the shape is A x B then I'd like
    memory requirements to be O(max(A,B)), but so far it does not appear
    to be possible, or at least not possible without significant
    complications and further slowdown. So, as an intermediate goal I am
    willing to accept that allocation would be O(max(M,N)). but amount of
    touched memory is O(max(A,B)).

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.



    Here is what I had in mind.
    I tried to optimize as little as I can in order to make it as simple
    as I can. Unfortunately, I am not particularly good at it, so, code
    still contains few unnecessary "tricks" that make understanding a
    little harder.
    The code uses VLA and recursion for the same purpose of making it less
    tricky.
    If desired, the memory footprint could be easily reduced by factor of 8
    through use of packed bit arrays instead arrays of _Bool.

    Even in this relatively crude form for majority of shapes this code is blazingly fast.
    Unfortunately, in the worst case (both 'slalom' shapes) an execution
    time is O(max(A,B)**3) which makes it unfit as general-purpose routine.
    At the moment I don't see a solution for this problem. Overall, it's
    probably a dead end.

    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Color;

    struct floodfill4_state {
    Color* image;
    ptrdiff_t width;
    _Bool *l_todo, *r_todo, *u_todo, *d_todo;
    int nx, ny;
    int x, y;
    Color old_color, new_color;
    };

    enum {
    more_r = 1, more_l = 2, more_d = 4, more_u = 8,
    more_lr = more_r+more_l, more_ud=more_u+more_d,
    };

    static
    int floodfill4_expand_lr(struct floodfill4_state* s, int exp_x,
    _Bool* src_todo, _Bool* exp_todo, int lr);
    static
    int floodfill4_expand_ud(struct floodfill4_state* s, int exp_x,
    _Bool* src_todo, _Bool* exp_todo, int ud);

    int floodfill4(Color* image, int width, int height, int x, int y,
    Color old_color, Color new_color)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;

    Color* beg = &image[(size_t)width*y+x];
    if (*beg != old_color)
    return 0;

    *beg = new_color;
    // Color* last_row = &image[(size_t)width*(height-1)];
    _Bool lr_todo[2][height];
    _Bool ud_todo[2][width];

    struct floodfill4_state s = {
    .image = beg,
    .width = width,
    .l_todo = &lr_todo[0][y],
    .r_todo = &lr_todo[1][y],
    .u_todo = &ud_todo[0][x],
    .d_todo = &ud_todo[1][x],
    .x = 0, .y = 0, .nx = 1, .ny = 1,
    .old_color = old_color,
    .new_color = new_color,
    };
    *s.l_todo = *s.r_todo = *s.u_todo = *s.d_todo = 1;

    // expansion loop
    for (int more = more_lr+more_ud; more != 0;) {
    if (more & more_lr) {
    _Bool exp_todo[s.ny];
    do {
    if (more & more_r) {
    while (x+s.nx != width) {
    // try to expand to the right
    s.x = s.nx-1;
    int ret = floodfill4_expand_lr(&s, s.nx, s.r_todo,
    exp_todo, more_r);
    if (!ret)
    break;
    more |= ret;
    ++s.nx;
    }
    more &= ~more_r;
    }
    if (more & more_l) {
    while (x != 0) {
    // try to expand to the left
    s.x = 0;
    int ret = floodfill4_expand_lr(&s, -1, s.l_todo, exp_todo,
    more_l);
    if (!ret)
    break;
    more |= ret;
    ++s.nx;
    --s.image;
    --s.u_todo;
    --s.d_todo;
    --x;
    }
    more &= ~more_l;
    }
    } while (more & more_lr);
    }

    if (more & more_ud) {
    _Bool exp_todo[s.nx];
    do {
    if (more & more_d) {
    while (y+s.ny != height) {
    // try to expand down
    s.y = s.ny-1;
    int ret = floodfill4_expand_ud(&s, s.ny, s.d_todo,
    exp_todo, more_d);
    if (!ret)
    break;
    more |= ret;
    ++s.ny;
    }
    more &= ~more_d;
    }
    if (more & more_u) {
    while (y != 0) {
    // try to expand up
    s.y = 0;
    int ret = floodfill4_expand_ud(&s, -1, s.u_todo, exp_todo,
    more_u);
    if (!ret)
    break;
    more |= ret;
    ++s.ny;
    s.image -= s.width;
    --s.l_todo;
    --s.r_todo;
    --y;
    }
    more &= ~more_u;
    }
    } while (more & more_ud);
    }
    }
    return 1;
    }

    // floodfill4_core - floodfill4 recursively in divide and conquer
    fashion
    // s.*-todo arrays initialized by caller. floodfill4_core sets values
    // in that indicate need for further action, but never clears values
    // that were already set
    static void floodfill4_core(const struct floodfill4_state* arg)
    {
    const int nx = arg->nx;
    const int ny = arg->ny;
    if (nx+ny == 2) { // nx==ny==1
    *arg->l_todo = *arg->r_todo = *arg->u_todo = *arg->d_todo = 1;
    *arg->image = arg->new_color;
    return;
    }

    struct floodfill4_state args[2];
    args[0] = args[1] = *arg;
    if (nx > ny) {
    // split vertically
    _Bool todo[2][ny];
    const int hx = nx / 2;

    args[0].r_todo = todo[0];
    args[0].nx = hx;

    args[1].image += hx;
    args[1].l_todo = todo[1];
    args[1].u_todo += hx;
    args[1].d_todo += hx;
    args[1].nx = nx-hx;

    int todo_i;
    int x0 = arg->x;
    if (x0 < hx) { // update left field
    memset(todo[0], 0, ny*sizeof(todo[0][0]));
    floodfill4_core(&args[0]);
    todo_i = 0;
    } else { // update right field
    memset(todo[1], 0, ny*sizeof(todo[0][0]));
    args[1].x = x0 - hx;
    floodfill4_core(&args[1]);
    todo_i = 1;
    }

    args[0].x = hx-1;
    args[1].x = 0;
    for (;;) {
    // look for contact points on destination edge
    _Bool *todo_src = todo[todo_i];
    Color *edge_dst = &arg->image[hx-todo_i];
    int y;
    for (y = 0; y < ny; edge_dst += arg->width, ++y) {
    if (todo_src[y] && *edge_dst == arg->old_color) // contact found
    break;
    }
    if (y == ny)
    break;

    todo_i = 1 - todo_i;
    memset(todo[todo_i], 0, ny*sizeof(todo[0][0]));
    do {
    args[todo_i].y = y;
    floodfill4_core(&args[todo_i]);
    edge_dst += arg->width;
    for (y = y+1; y < ny; edge_dst += arg->width, ++y) {
    if (todo_src[y] && *edge_dst == arg->old_color) // contact
    found
    break;
    }
    } while (y < ny);
    }
    } else { // ny >= nx
    // split horizontally
    _Bool todo[2][nx];
    const int hy = ny / 2;
    Color* edge = &arg->image[arg->width*hy];

    args[0].d_todo = todo[0];
    args[0].ny = hy;

    args[1].image = edge;
    args[1].u_todo = todo[1];
    args[1].l_todo += hy;
    args[1].r_todo += hy;
    args[1].ny = ny-hy;

    int todo_i;
    int y0 = arg->y;
    if (y0 < hy) { // update up field
    memset(todo[0], 0, nx*sizeof(todo[0][0]));
    floodfill4_core(&args[0]);
    todo_i = 0;
    } else { // update down field
    args[1].y = y0 - hy;
    memset(todo[1], 0, nx*sizeof(todo[0][0]));
    floodfill4_core(&args[1]);
    todo_i = 1;
    }

    args[0].y = hy-1;
    args[1].y = 0;
    for (;;) {
    // look for contact points on destination edge
    _Bool *todo_src = todo[todo_i];
    Color *edge_dst = todo_i ? edge - arg->width : edge;
    int x;
    for (x = 0; x < nx; ++x) {
    if (todo_src[x] && edge_dst[x] == arg->old_color) // contact
    found
    break;
    }
    if (x == nx)
    break;

    todo_i = 1 - todo_i;
    memset(todo[todo_i], 0, nx*sizeof(todo[0][0]));
    do {
    args[todo_i].x = x;
    floodfill4_core(&args[todo_i]);
    for (x = x+1; x < nx; ++x) {
    if (todo_src[x] && edge_dst[x] == arg->old_color) // contact
    found
    break;
    }
    } while (x < nx);
    }
    }
    }


    // return value
    // 0 - not expanded
    // 1 - expanded, no bounce back
    // 2 - expanded, possible bounce back
    static
    int floodfill4_expand(
    Color* pixels, // row or column
    ptrdiff_t incr, // distance between adjacent points of pixels
    int len,
    Color old_color,
    Color new_color,
    _Bool* src_todo,
    _Bool* dst_todo,
    _Bool first)
    {
    for (int i = 0; i < len; pixels += incr, ++i) {
    if (src_todo[i] && *pixels == old_color) {
    // contact found
    if (first)
    memset(dst_todo, 0, len*sizeof(*dst_todo));
    *pixels = new_color;
    dst_todo[i] = 1;
    Color* p = pixels - incr;
    int k;
    for (k = i-1; k >= 0 && *p == old_color; p -= incr, --k) {
    *p = new_color;
    dst_todo[k] = 1;
    }
    _Bool more = k != i-1;
    for (;;) {
    pixels += incr;
    for (i = i+1; i < len && *pixels == old_color; pixels += incr,
    ++i) {
    *pixels = new_color;
    dst_todo[i] = 1;
    more |= src_todo[i] ^ 1;
    }
    if (i >= len)
    break;
    pixels += incr;
    for (i = i+1; i < len && (!src_todo[i] || *pixels !=
    old_color); pixels += incr, ++i);
    if (i >= len)
    break;
    *pixels = new_color;
    dst_todo[i] = 1;
    Color* p = pixels - incr;
    for (k = i-1; *p == old_color; --k, p -= incr) {
    *p = new_color;
    dst_todo[k] = 1;
    }
    more |= k != i-1;
    }
    return more ? 2 : 1;
    }
    }
    return 0; // not expended
    }

    // return value - more code
    static
    int floodfill4_expand_lr(struct floodfill4_state* s, int exp_x, _Bool* src_todo, _Bool* exp_todo, int lr)
    {
    // try to expand to the right or left
    const int ny = s->ny;
    int ret = floodfill4_expand(&s->image[exp_x], s->width, ny,
    old_color, s->new_color, src_todo, exp_todo, 1);
    if (!ret)
    return 0;

    int result = lr;
    while (ret == 2) {
    Color* p = &s->image[s->x];
    _Bool contact = 0;
    for (int y = 0; y < ny; p += s->width, ++y) {
    if (exp_todo[y] && *p == s->old_color) {
    if (!contact)
    memset(src_todo, 0, ny*sizeof(*src_todo));
    s->y = y;
    floodfill4_core(s);
    contact = 1;
    }
    }
    if (!contact)
    break;
    result = more_lr+more_ud;
    ret = floodfill4_expand(&s->image[exp_x], s->width, ny,
    s->old_color, s->new_color, src_todo, exp_todo, 0);
    }

    if ((s->u_todo[exp_x] = exp_todo[0])) result |= more_u;
    if ((s->d_todo[exp_x] = exp_todo[ny-1])) result |= more_d;
    memcpy(src_todo, exp_todo, ny*sizeof(*src_todo));
    return result;
    }

    // return value - more code
    static
    int floodfill4_expand_ud(struct floodfill4_state* s, int exp_y, _Bool* src_todo, _Bool* exp_todo, int ud)
    {
    // try to expand up or down
    const int nx = s->nx;
    int ret = floodfill4_expand(&s->image[s->width*exp_y], 1, nx,
    old_color, s->new_color, src_todo, exp_todo, 1);
    if (!ret)
    return 0;

    int result = ud;
    while (ret == 2) {
    Color* p = &s->image[s->width*s->y];
    _Bool contact = 0;
    for (int x = 0; x < nx; ++x) {
    if (exp_todo[x] && p[x] == s->old_color) {
    if (!contact)
    memset(src_todo, 0, nx*sizeof(*src_todo));
    s->x = x;
    floodfill4_core(s);
    contact = 1;
    }
    }
    if (!contact)
    break;
    result = more_lr+more_ud;
    ret = floodfill4_expand(&s->image[s->width*exp_y], 1, nx,
    s->old_color, s->new_color, src_todo, exp_todo, 0);
    }

    if ((s->l_todo[exp_y] = exp_todo[0])) result |= more_l;
    if ((s->r_todo[exp_y] = exp_todo[nx-1])) result |= more_r;
    memcpy(src_todo, exp_todo, nx*sizeof(*src_todo));
    return result;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Fri May 3 18:33:05 2024
    On Thu, 25 Apr 2024 17:56:06 +0300
    Michael S <[email protected]> wrote:

    On Sat, 20 Apr 2024 21:10:23 +0300
    Michael S <[email protected]> wrote:

    On Fri, 19 Apr 2024 14:59:20 -0700
    Tim Rentsch <[email protected]> wrote:


    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.


    No, I mean O(max(M,N)) plus possibly some logarithmic component that
    loses significance when images grow bigger.
    More so, if bounding rectangle of the shape is A x B then I'd like
    memory requirements to be O(max(A,B)), but so far it does not appear
    to be possible, or at least not possible without significant
    complications and further slowdown. So, as an intermediate goal I am willing to accept that allocation would be O(max(M,N)). but amount
    of touched memory is O(max(A,B)).

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.



    Here is what I had in mind.
    I tried to optimize as little as I can in order to make it as simple
    as I can. Unfortunately, I am not particularly good at it, so, code
    still contains few unnecessary "tricks" that make understanding a
    little harder.
    The code uses VLA and recursion for the same purpose of making it less tricky.
    If desired, the memory footprint could be easily reduced by factor of
    8 through use of packed bit arrays instead arrays of _Bool.

    Even in this relatively crude form for majority of shapes this code is blazingly fast.
    Unfortunately, in the worst case (both 'slalom' shapes) an execution
    time is O(max(A,B)**3) which makes it unfit as general-purpose
    routine. At the moment I don't see a solution for this problem.
    Overall, it's probably a dead end.


    A solution (sort of) is in line with the famous quite of David Wheeler
    - to turn todo lists from bit maps into arrays of
    abscesses-or-ordinates of contact points.

    The cost is a memory footprint - 4x bigger than the previous version, 32
    times bigger than above-mentioned "packed" variant of the previous
    version. But in BigO sense it's the same.

    In my tests it reduced the worst case time from O(max(A,B)**3) to O(A*B*log(max(A,B)). Which is non-ideal, but probably acceptable,
    because the bad cases should be very rare in practice.

    The real trouble is different - I don't know if my "worst case" is
    really the worst.

    The code below is for presentation of algorithm in both clear and
    compact manner, with emphasis on symmetry between x and y directions.
    It is not optimal in any sense and can be made no-trivially faster both
    by algorithm enhancements an by specialization of critical loops.


    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Color;

    enum coordinate_axes {
    x_i = 0, y_i, // index of pos[], ld[], 1st index of limits[][],
    todo[][] };
    enum from_to {
    from_i = 0, to_i // 2nd index of limits[][], todo[][], I use 0 and 1
    more commonly };
    enum { // indices of todo[] lists
    le_i = x_i*2+from_i, ri_i = x_i*2+to_i,
    up_i = y_i*2+from_i, dn_i = y_i*2+to_i,
    };

    #define IDX2INC(ft_idx) ((int)(ft_idx)*2 - 1)
    #define X2Y(axis) ((axis) ^ 1)

    typedef struct {
    Color* image;
    Color old_color, new_color;
    ptrdiff_t ld[2]; // {1, width}
    int limits[2][2]; // {{0, width-1}, {0, height-1}
    } floodfill4_param;

    typedef struct {
    int *todo[4]; // {left,right, up, down} - first item holds the #
    of active entries int limits[2][2]; // {{x0, x1}, {y0, y1}};
    int pos[2]; // {x, y}
    } floodfill4_state;

    static void floodfill4_core(
    const floodfill4_param* prm,
    const floodfill4_state* arg);
    static _Bool floodfill4_expand(
    const floodfill4_param* prm,
    floodfill4_state* s,
    enum coordinate_axes axis, enum from_to ft_idx);

    int floodfill4(
    Color* image,
    int width, int height,
    int x, int y,
    Color old_color, Color new_color)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;

    if (image[(size_t)width*y+x] != old_color)
    return 0;

    int lr_todo[2][height+1];
    int ud_todo[2][width+1];
    floodfill4_param prm = {
    .image = image,
    .ld[x_i] = 1,
    .ld[y_i] = width,
    .limits = {{ 0, width-1}, {0, height-1}},
    .old_color = old_color,
    .new_color = new_color,
    };
    floodfill4_state s = {
    .todo[le_i] = lr_todo[0],
    .todo[ri_i] = lr_todo[1],
    .todo[up_i] = ud_todo[0],
    .todo[dn_i] = ud_todo[1],
    .limits[x_i] = {x, x},
    .limits[y_i] = {y, y},
    .pos[x_i] = x, .pos[y_i] = y,
    };
    for (int i = 0; i < 4; ++i)
    *s.todo[i] = 0;

    // process central 1x1 rectangle
    floodfill4_core(&prm, &s);

    // expansion loop
    for (unsigned idx = 0; idx < 4;) {
    if (floodfill4_expand(&prm, &s, idx/2, idx % 2)) { // try to expand
    idx = 0; // expansion succeed - restart from beginning
    continue;
    }
    ++idx;
    }

    return 1;
    }

    static __inline
    void floodfill4_add(int* list, int val)
    {
    int n = list[0];
    list[n+1] = val;
    list[0] = n + 1;
    }

    // floodfill4_core - floodfill4 recursively in divide and conquer
    fashion // arg->*_todo arrays (lists) initialized by caller.
    // floodfill4_core adds to *_todo values that indicate need for further
    // action, but never removes anything
    static void floodfill4_core(const floodfill4_param* prm, const floodfill4_state* arg) {
    const int ni[2] = {
    arg->limits[x_i][1]-arg->limits[x_i][0],
    arg->limits[y_i][1]-arg->limits[y_i][0],
    };
    if (ni[x_i] + ni[y_i] == 0) { // nx==ny==1
    prm->image[prm->ld[y_i]*arg->limits[y_i][0]+arg->limits[x_i][0]] = prm->new_color; floodfill4_add(arg->todo[le_i], arg->limits[y_i][0]);
    floodfill4_add(arg->todo[ri_i], arg->limits[y_i][0]);
    floodfill4_add(arg->todo[up_i], arg->limits[x_i][0]);
    floodfill4_add(arg->todo[dn_i], arg->limits[x_i][0]);
    return;
    }

    floodfill4_state args[2];
    args[0] = args[1] = *arg;
    const enum coordinate_axes axis = ni[x_i] > ni[y_i] ?
    x_i : // split vertically
    y_i ; // split horizontally
    int todo[2][ni[X2Y(axis)]+2]; // contacts between halves
    const int hpos = (arg->limits[axis][0] + arg->limits[axis][1])/2; //
    split point args[0].todo[axis*2+1] = todo[0]; args[0].limits[axis][1]
    = hpos; args[1].todo[axis*2+0] = todo[1]; args[1].limits[axis][0] =
    hpos + 1; int todo_i = arg->pos[axis] > hpos;
    todo[todo_i][0] = 0; // empty todo list
    floodfill4_core(prm, &args[todo_i]);
    if (todo[todo_i][0] != 0) {
    // do ping-pong between halves
    args[0].pos[axis] = hpos;
    args[1].pos[axis] = hpos+1;
    const ptrdiff_t lda = prm->ld[axis];
    const ptrdiff_t ldb = prm->ld[X2Y(axis)];
    Color* edge = &prm->image[lda*hpos];
    do {
    // look for contact points on destination edge
    int* from = todo[todo_i];
    Color *edge_dst = todo_i ? edge : edge + lda;
    todo_i = 1 - todo_i;
    todo[todo_i][0] = 0;
    int np = *from++;
    do {
    int pos = *from++;
    if (edge_dst[pos*ldb] == prm->old_color) { // contact found
    args[todo_i].pos[X2Y(axis)] = pos;
    floodfill4_core(prm, &args[todo_i]);
    }
    } while (--np);
    } while (todo[todo_i][0] != 0);
    }
    }

    static
    _Bool floodfill4_expand(
    const floodfill4_param* prm,
    floodfill4_state* s,
    enum coordinate_axes axis,
    enum from_to ft_idx)
    { // try to expand
    int* src_todo = s->todo[axis*2+ft_idx];
    if (*src_todo == 0)
    return 0;

    int src_pos = s->limits[axis][ft_idx];
    if (src_pos == prm->limits[axis][ft_idx]) {
    *src_todo = 0;
    return 0;
    }

    typedef struct {
    int pos0, pos1;
    } interval_t;

    const ptrdiff_t lda = prm->ld[axis];
    const ptrdiff_t ldb = prm->ld[X2Y(axis)];
    Color* src_col = &prm->image[lda*src_pos];
    Color* exp_col = &src_col[lda*IDX2INC(ft_idx)];
    const int ort_limit0 = s->limits[X2Y(axis)][0];
    const int ort_limit1 = s->limits[X2Y(axis)][1];
    const Color c0 = exp_col[ldb*ort_limit0]; // preserve upper corner
    const Color c1 = exp_col[ldb*ort_limit1]; // preserve lower corner
    interval_t workbuf[(ort_limit1 - ort_limit0+2)/2];
    interval_t* wr = workbuf;
    s->pos[axis] = src_pos;
    int n_todo = src_todo[0];
    do {
    // look for contact
    int pos = src_todo[n_todo--]; // use src_todo as stack, popping
    from the top Color* pt = &exp_col[ldb*pos];
    if (*pt == prm->old_color) { // contact found
    *pt = prm->new_color;
    // extend backward
    Color* p = pt - ldb;
    int pos0;
    for (pos0 = pos-1; pos0 >= ort_limit0 && *p == prm->old_color; p
    -= ldb, --pos0) *p = prm->new_color;
    pos0 += 1;
    // extend forward
    p = pt + ldb;
    int pos1;
    for (pos1 = pos+1; pos1 <= ort_limit1 && *p == prm->old_color; p
    += ldb, ++pos1) *p = prm->new_color;
    pos1 -= 1;

    // add interval to result list
    wr->pos0 = pos0;
    wr->pos1 = pos1;
    ++wr;

    if (pos0 != pos1) {
    // bounce - apply new found interval to original rectangle
    src_todo[0] = n_todo;
    pos = pos0;
    p = &src_col[ldb*pos];
    do {
    if (*p == prm->old_color) { // contact found
    s->pos[X2Y(axis)] = pos;
    floodfill4_core(prm, s);
    n_todo = src_todo[0];
    ++pos;
    p += ldb;
    }
    p += ldb;
    } while (++pos <= pos1);
    }
    }
    } while (n_todo != 0);

    if (wr == workbuf)
    return 0; // rectangle not expanded

    // rectangle expanded
    // handle corners of expanded rectangle
    int exp_pos = src_pos + IDX2INC(ft_idx);
    s->limits[axis][ft_idx] = exp_pos;
    if (exp_col[ldb*ort_limit0] != c0) // corner0 modified
    floodfill4_add(s->todo[X2Y(axis)*2+0], exp_pos); // add to todo0
    list if (exp_col[ldb*ort_limit1] != c1) // corner1
    modified floodfill4_add(s->todo[X2Y(axis)*2+1], exp_pos); // add to
    todo1 list

    // turn intervals to list
    interval_t* rd = workbuf;
    int* dst_todo = &src_todo[1];
    do {
    int pos = rd->pos0;
    int pos1 = rd->pos1;
    do
    *dst_todo++ = pos;
    while (++pos <= pos1);
    ++rd;
    } while (rd != wr);
    src_todo[0] = dst_todo - src_todo - 1;

    return 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed May 15 09:57:39 2024
    Michael S <[email protected]> writes:

    On Fri, 19 Apr 2024 14:59:20 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your
    test hardware.

    I think trying to look at those wouldn't tell me anything
    helpful. Too many unknowns. And still no way to test or
    measure any changes to the various algorithms.

    Frankly, I don't understand.
    If you have troubles with testing on shared hardware then you can
    always test on the hardware that you own and has full control.
    Even if it is a little old, the trends tend to be the same. At
    least I clearly see the same trends on my almost 12 y.o. home PC
    and on relatively modern EPYC3.

    I have put this problem aside. It's a lot of work even if I had
    a way to make substantive progress, and at present I don't.
    Maybe more sometime later but for now I think suspending is the
    only workable choice available.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Wed Jun 5 17:59:07 2024
    On Wed, 5 Jun 2024 17:45:45 +0300
    Michael S <[email protected]> wrote:

    On Fri, 3 May 2024 18:33:05 +0300
    Michael S <[email protected]> wrote:

    On Thu, 25 Apr 2024 17:56:06 +0300
    Michael S <[email protected]> wrote:


    A solution (sort of) is in line with the famous quite of David
    Wheeler
    - to turn todo lists from bit maps into arrays of
    abscesses-or-ordinates of contact points.

    The cost is a memory footprint - 4x bigger than the previous
    version, 32 times bigger than above-mentioned "packed" variant of
    the previous version. But in BigO sense it's the same.

    In my tests it reduced the worst case time from O(max(A,B)**3) to O(A*B*log(max(A,B)). Which is non-ideal, but probably acceptable,
    because the bad cases should be very rare in practice.

    The real trouble is different - I don't know if my "worst case" is
    really the worst.

    The code below is for presentation of algorithm in both clear and
    compact manner, with emphasis on symmetry between x and y
    directions. It is not optimal in any sense and can be made
    no-trivially faster both by algorithm enhancements an by
    specialization of critical loops.



    Following code improves on ideas from the previous post.
    Unlike the previous one, it is purely iterative, with no recursion.
    The algorithm is simpler and access storage in more compact manner,
    i.e. all accessed memory area starts from beginning and grows
    according to need. Previous attempt did not have this property.
    It's still longer and less simple than I would like.


    And here is something that I found by chance when developing the code
    presented in the previous post.
    Unlike for the previous one, I can not prove that memory requirements
    of this algorithm are O(N). However, for all my tests cases it's not
    just O(N), but consumes significantly less memory than the one above.
    And it is simpler and shorter.

    // HIS - todo stack of Horizontal Intervals
    // with periodic Squeeze of empty intervals
    #include <stddef.h>
    #include <stdlib.h>
    #include <stdint.h>

    typedef unsigned char Color;

    int floodfill4(
    Color* image,
    int width, int height,
    int x, int y,
    Color old_color, Color new_color)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;

    size_t w = width;
    Color* row = &image[w*y];
    if (row[x] != old_color)
    return 0;

    typedef struct {
    int x0, x1, y;
    int8_t from; // -1 => from y-1, +1 => from y+1
    } interval_t;

    enum {
    INITIAL_STACK_SIZE = 128,
    SQUEEZE_THR = 32,
    };
    interval_t* stack_base =
    malloc(INITIAL_STACK_SIZE*sizeof(*stack_base));
    if (!stack_base)
    return -1;
    interval_t* stack_end = &stack_base[INITIAL_STACK_SIZE];
    interval_t* todo = stack_base;

    // recolor initial horizontal interval
    row[x] = new_color;
    // look backward
    int x00;
    for (x00 = x-1; x00 >= 0 && row[x00]==old_color; --x00)
    row[x00] = new_color;
    x00 += 1;
    // look forward
    int x01;
    for (x01 = x+1; x01 < width && row[x01]==old_color; ++x01)
    row[x01] = new_color;
    x01 -= 1;

    // push neighbors of initial interval on todo stack
    for (int from = -1; from <= 1; from += 2) {
    unsigned next_y = y-from;
    if (next_y < (unsigned)height) {
    todo->x0 = x00;
    todo->x1 = x01;
    todo->y = next_y;
    todo->from = from;
    ++todo;
    }
    }

    interval_t* squeezed = stack_base;
    unsigned periodic_i = 0;
    while (todo != stack_base) {
    --todo; // pop interval from todo stack
    int xBeg = todo->x0;
    int xEnd = todo->x1;
    int y = todo->y;

    if (todo < squeezed)
    squeezed = todo;

    // look for target points
    Color* row = &image[y*w];
    int x = xBeg;
    do {
    if (row[x] == old_color) { // target found
    row[x] = new_color;
    int x0 = x;
    if (x == xBeg) {
    // look backward
    for (x0 = x-1; x0 >= 0 && row[x0]==old_color; --x0)
    row[x0] = new_color;
    x0 += 1;
    }
    // look forward
    int x1;
    for (x1 = x+1; x1 < width && row[x1]==old_color; ++x1)
    row[x1] = new_color;
    x1 -= 1;

    int from = todo->from;
    // remaining part of current interval
    if (x1+2 <= xEnd) {
    todo->x0 = x+2;
    todo->x1 = xEnd;
    todo->y = y;
    todo->from = from;
    ++todo;
    }
    // forward continuation
    unsigned next_y = y-from;
    if (next_y < (unsigned)height) {
    todo->x0 = x0;
    todo->x1 = x1;
    todo->y = next_y;
    todo->from = from;
    ++todo;
    }
    // bounces
    y = y+from;
    if (xEnd+2 <= x1) { // bounce on the right side
    todo->x0 = xEnd+2;
    todo->x1 = x1;
    todo->y = y;
    todo->from = -from;
    ++todo;
    }
    if (x0 <= xBeg-2) { // bounce on the left side
    todo->x0 = x0;
    todo->x1 = xBeg-2;
    todo->y = y;
    todo->from = -from;
    ++todo;
    }
    break;
    }
    ++x;
    } while (x <= xEnd);

    ++periodic_i;
    if ((periodic_i & 31)==0) { // maintenance
    if (todo - squeezed >= SQUEEZE_THR) {
    // squeeze empty intervals
    interval_t* wr = squeezed;
    while (squeezed != todo) {
    Color* row = &image[squeezed->y*w];
    for (int x = squeezed->x0; x <= squeezed->x1; ++x) {
    if (row[x] == old_color) { // interval non-empty
    *wr = *squeezed;
    wr->x0 = x;
    ++wr;
    break;
    }
    }
    ++squeezed;
    }
    todo = squeezed = wr;
    }

    if (stack_end-todo < 67) {
    // Allocate more space
    size_t todo_i = todo - stack_base;
    size_t squeezed_i = squeezed - stack_base;
    size_t sz = stack_end - stack_base;
    sz += (sz/128)*64;
    interval_t* tmp = realloc(
    stack_base, sz*sizeof(*stack_base));
    if (!tmp) {
    free(stack_base);
    return -1;
    }
    stack_base = tmp;
    stack_end = &stack_base[sz];
    todo = &stack_base[todo_i];
    squeezed = &stack_base[squeezed_i];
    }
    }
    }

    free(stack_base);
    return 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Wed Jun 5 17:45:45 2024
    On Fri, 3 May 2024 18:33:05 +0300
    Michael S <[email protected]> wrote:

    On Thu, 25 Apr 2024 17:56:06 +0300
    Michael S <[email protected]> wrote:


    A solution (sort of) is in line with the famous quite of David Wheeler
    - to turn todo lists from bit maps into arrays of
    abscesses-or-ordinates of contact points.

    The cost is a memory footprint - 4x bigger than the previous version,
    32 times bigger than above-mentioned "packed" variant of the previous version. But in BigO sense it's the same.

    In my tests it reduced the worst case time from O(max(A,B)**3) to O(A*B*log(max(A,B)). Which is non-ideal, but probably acceptable,
    because the bad cases should be very rare in practice.

    The real trouble is different - I don't know if my "worst case" is
    really the worst.

    The code below is for presentation of algorithm in both clear and
    compact manner, with emphasis on symmetry between x and y directions.
    It is not optimal in any sense and can be made no-trivially faster
    both by algorithm enhancements an by specialization of critical loops.



    Following code improves on ideas from the previous post.
    Unlike the previous one, it is purely iterative, with no recursion.
    The algorithm is simpler and access storage in more compact manner, i.e.
    all accessed memory area starts from beginning and grows according to
    need. Previous attempt did not have this property.
    It's still longer and less simple than I would like.

    // try+split algorithm with flat storage
    // - horizontal intervals
    // - two stacks: main stack for intervals,
    // auxiliary stack of areas of interest (AoI)
    // - both stacks implemented as arrays
    #include <stddef.h>
    #include <string.h>
    #include <stdlib.h>
    #include <stdint.h>
    #define NDEBUG
    #include <assert.h>
    #include <stdio.h>

    typedef unsigned char Color;

    typedef struct {
    size_t n_intervals;
    size_t n_splits;
    } stack_sizes_t;

    static
    stack_sizes_t floodfill4_calc_stack_size(int width, int height)
    {
    stack_sizes_t sz = { .n_intervals = 1, .n_splits = 1 };
    for (;;) {
    ptrdiff_t len;
    if (width > height) { // split vertically
    len = height;
    width = (width + 1)/2;
    } else { // split horizontally
    len = width;
    height = (height + 1)/2;
    }
    if (len <= 1)
    break;
    sz.n_intervals += len*2 + 4;
    sz.n_splits += 1;
    }
    return sz;
    }

    int floodfill4(
    Color* image,
    int width, int height,
    int x, int y,
    Color old_color, Color new_color)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;

    size_t w = width;
    Color* row = &image[w*y];
    if (row[x] != old_color)
    return 0;

    enum coordinate_axes {
    x_i = 0, y_i, // index of pos[] MS bit of index of limits[][]
    };
    #define X2Y(axis) ((axis) ^ 1)
    enum beg_or_end {
    beg_i = 0, end_i // LS bit of index of limits[],
    // I use 0 and 1 more commonly
    };
    enum limits_idx { // index of limits[]
    x0_i = x_i*2+beg_i,
    x1_i = x_i*2+end_i,
    y0_i = y_i*2+beg_i,
    y1_i = y_i*2+end_i,
    };

    typedef struct {
    int x0, x1, y;
    int from; // 0 => from y-1, 1 => from y+1
    } interval_t;
    typedef struct {
    interval_t* parent_todo;
    int saved_limit_val;
    uint8_t saved_limit_idx; // axis*2+beg_or_end
    int frame_capacity_deficit;
    } parent_info_t;

    stack_sizes_t stacks_len = floodfill4_calc_stack_size(width, height);
    const size_t parent_info_sz = stacks_len.n_splits *
    sizeof(parent_info_t); const size_t todo_sz = stacks_len.n_intervals
    * sizeof(interval_t); void* stacks = malloc(parent_info_sz + todo_sz);
    if (!stacks)
    return -1;

    parent_info_t* parents_stack = stacks;
    parent_info_t* parents_stack_end =
    &parents_stack[stacks_len.n_splits]; interval_t* todo_stack =
    (interval_t*)parents_stack_end; interval_t* todo = todo_stack;
    #ifndef NDEBUG
    interval_t* todo_stack_end = &todo[stacks_len.n_intervals];
    #endif

    int limits[2*2] = { 0, width-1, 0, height-1}; // {x0, x1, y0, y1};

    // recolor initial horizontal interval
    row[x] = new_color;
    // look backward
    int x00;
    for (x00 = x-1; x00 >= 0 && row[x00]==old_color; --x00)
    row[x00] = new_color;
    x00 += 1;
    // look forward
    int x01;
    for (x01 = x+1; x01 < width && row[x01]==old_color; ++x01)
    row[x01] = new_color;
    x01 -= 1;
    // push neighbors of initial interval on todo stack
    for (enum beg_or_end from = beg_i; from <= end_i; ++from) {
    unsigned next_y = y+1-from*2;
    if (next_y < (unsigned)height) {
    todo->x0 = x00;
    todo->x1 = x01;
    todo->y = next_y;
    todo->from = from;
    ++todo;
    }
    }

    parent_info_t* parent_aoi = parents_stack;
    interval_t* parent_todo = todo_stack;
    ptrdiff_t frame_capacity = width < height ? width : height;
    for (;;) {
    while (todo != parent_todo) {
    assert(todo_stack_end != todo);
    assert(parent_todo >= todo_stack && parent_todo <=
    todo_stack_end); // Get interval from top of todo stack
    --todo; // pop interval from todo stack
    int xBeg = todo->x0;
    int xEnd = todo->x1;
    int y = todo->y;
    int from = todo->from;
    // check range
    if ((unsigned)(y-limits[y0_i]) >
    (unsigned)(limits[y1_i]-limits[y0_i]) || xEnd < limits[x0_i] || xBeg
    > limits[x1_i]) { // Whole interval belongs to parent
    // Bring value from the bottom of todo stack to the top
    // freeing stack slot for parent stack
    assert(todo_stack_end != todo);
    *todo = *parent_todo; ++todo;
    // Store interval on top of parent stack
    parent_todo->x0 = xBeg;
    parent_todo->x1 = xEnd;
    parent_todo->y = y;
    parent_todo->from = from;
    ++parent_todo;
    continue;
    }
    // At least a part of the interval is in current rectangle
    if (xBeg < limits[x0_i]) {
    // left part of interval belongs to parent
    // Store left part of interval on todo stack
    // for later demotion to parent's stack
    assert(todo_stack_end != todo);
    todo->x0 = xBeg;
    todo->x1 = limits[x0_i]-1;
    todo->y = y;
    todo->from = from;
    ++todo;
    xBeg = limits[x0_i]; // adjust xBeg
    }
    if (xEnd > limits[x1_i] ) {
    // right part of interval belongs to parent
    // Store right part of interval on todo stack
    // for later demotion to parent's stack
    assert(todo_stack_end != todo);
    todo->x0 = limits[x1_i]+1;
    todo->x1 = xEnd;
    todo->y = y;
    todo->from = from;
    ++todo;
    xEnd = limits[x1_i]; // adjust xEnd
    }
    // remaining part of interval is within limits

    // look for target points
    Color* row = &image[y*w];
    int x = xBeg;
    do {
    if (row[x] == old_color) { // target found
    if (todo-parent_todo > frame_capacity) {
    // can't complete floodfill of current rectangle
    // due to space constraints.
    // Split
    const int dLim[] = {
    limits[x1_i]-limits[x0_i],
    limits[y1_i]-limits[y0_i]};
    const enum coordinate_axes axis =
    dLim[x_i] > dLim[y_i] ?
    x_i : // split vertically
    y_i ; // split horizontally
    // select half
    const int hpos0 = (limits[axis*2+0] +
    limits[axis*2+1])/2; // lower split point
    const int pos[2] = { [x_i] = x, [y_i] = y};
    enum beg_or_end src_i = pos[axis] > hpos0;

    // preserve state of current rectangle on parents stack
    assert(parent_aoi != parents_stack_end);
    parent_aoi->parent_todo = parent_todo;
    enum beg_or_end save_i = 1 - src_i;
    enum limits_idx saved_limit_idx = axis*2+save_i;
    parent_aoi->saved_limit_idx = saved_limit_idx;
    parent_aoi->saved_limit_val = limits[saved_limit_idx];
    parent_aoi->frame_capacity_deficit =
    todo-parent_todo - frame_capacity;
    ++parent_aoi;

    // switch processing to selected half of rectangle
    frame_capacity = dLim[X2Y(axis)] + 1;
    limits[saved_limit_idx] = hpos0+src_i;
    parent_todo = todo;
    // push interval on fresh todo stack
    assert(todo_stack_end != todo);
    todo->x0 = x;
    todo->x1 = xEnd;
    todo->y = y;
    todo->from = from;
    ++todo;
    break;
    }

    row[x] = new_color;
    // look forward
    int x1;
    for (x1 = x+1; x1 < width && row[x1]==old_color; ++x1)
    row[x1] = new_color;
    x1 -= 1;

    int x0 = x;
    if (x == xBeg) {
    // look backward
    for (x0 = x-1; x0 >= 0 && row[x0]==old_color; --x0)
    row[x0] = new_color;
    x0 += 1;

    // bounce
    if (x0 <= xBeg-2) {
    assert(todo_stack_end != todo);
    todo->x0 = x0;
    todo->x1 = xBeg-2;
    todo->y = y+from*2-1;
    todo->from = 1-from;
    ++todo;
    }
    }
    // bounce
    if (x1 >= xEnd+2) {
    assert(todo_stack_end != todo);
    todo->x1 = x1;
    todo->x0 = xEnd+2;
    todo->y = y+from*2-1;
    todo->from = 1-from;
    ++todo;
    }
    unsigned next_y = y+1-from*2;
    if (next_y < (unsigned)height) {
    // continuation
    #if 1
    // The following if is not necessary for correction
    // It is here to speed up few test cases
    if (y != limits[y0_i+1-from] &&
    x0 >= limits[x0_i] &&
    x1 <= limits[x1_i] &&
    x1+2 > xEnd &&
    todo-parent_todo <= frame_capacity)
    { // Bypass stack
    // Advance vertically in the same direction
    xBeg = x = x0;
    xEnd = x1;
    y = next_y;
    row = &image[y*w];
    continue;
    }
    #endif
    // put new interval on current stack
    assert(todo_stack_end != todo);
    todo->x0 = x0;
    todo->x1 = x1;
    todo->y = next_y;
    todo->from = from;
    ++todo;
    }
    x = x1+1;
    }
    ++x;
    } while (x <= xEnd);
    }

    if (parent_aoi == parents_stack)
    break; // top AOI finished

    // back to parent rectangle
    --parent_aoi;
    parent_todo = parent_aoi->parent_todo;
    limits[parent_aoi->saved_limit_idx]= parent_aoi->saved_limit_val;
    frame_capacity = todo - parent_todo
    - parent_aoi->frame_capacity_deficit;
    }
    assert((void*)todo == (void*)parents_stack_end);

    free(stacks);
    return 1;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)