Forum: >>> Magnum BBS <<<

filling area by color atack safety

From fir@21:1/5 to All on Sat Mar 16 05:11:44 2024

i was writing simple editor (something like paint but more custom for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to do
it by hand pixel by pixel and the need to change color of given element
is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on adjacent
4 pixels (only need check if it is in screen at all and if the color to
change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this - incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but still

is there maybe simple way to improve it?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Sat Mar 16 13:55:03 2024

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Ben Bacarisse on Sat Mar 16 14:41:50 2024

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

I personally find recursion hard work and errors much harder to debug.
It is also becomes much more important to show that will not cause stack overflow.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Sat Mar 16 15:40:09 2024

On 16/03/2024 12:33, Malcolm McLean wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom for
my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to
do it by hand pixel by pixel and the need to change color of given
element is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on
adjacent 4 pixels (only need check if it is in screen at all and if
the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
   if(old_color == new_color) return 0;

   if(XYIsInScreen( x, y))
   if(GetPixelUnsafe(x,y)==old_color)
   {
     SetPixelSafe(x,y,new_color);
     RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
     RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
     RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
     RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
     return 1;
   }

   return 0;
}

it work but im not quite sure how to estimate the safety of this -
incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but still

is there maybe simple way to improve it?

This is a cheap and cheerful fllod fill. And it's easy to get right and shouldn't afall over. But but makes an awful not of unnecessary calls,
and on a small system and large image might even blow the stack.

It is not going to lead to stack overflow on any reasonable system. If
the image size is 200 x 200, as the OP said, it will never reach a depth
of more than 400 calls (the maximum path length before back-tracking is inevitable). Even for big images, I can't see it being a problem. I
remember using the same method on a 16K ZX Spectrum as a teenager.

Recursion make programs harder to reason about and prove correct.

As a general statement, that is simply wrong. It is no coincidence that
most provably correct software development is done using functional
programming languages, which are based entirely on recursion. Recursion
maps well to inductive proofs, and avoids variables, and is thus often
much easier to work with for proving code correct.

So a real flood fill doesn't work like that. You use a queue and put the pixels to be filled into that, and trace lines.

That might be a bit more efficient, but not significantly so (at least,
not in your implementation below). You are using a queue instead of the
stack, but it will grow in exactly the same manner.

And here's some code I wrote a while ago. Use that as a pattern. But not
sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove correct
than the OP's original one, and unlikely to be very much faster (it will certainly scale in the same way in both time and memory usage).

There are a variety of different flood-fill algorithms, with different advantages and disadvantages. Speeds will often depend as much on the
way the get/set pixel code works, especially if the flood-fill is on
live displayed data rather than in a buffer off-screen. But typically
you need to get a /lot/ more advanced (i.e., not your algorithm) to
improve on the OP's version by an order of magnitude, so if speed is not essential but understanding that it is correct is important, then it
makes more sense to stick to the original recursive version.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to fir on Sat Mar 16 19:13:32 2024

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to do
it by hand pixel by pixel and the need to change color of given element
is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on adjacent
4 pixels (only need check if it is in screen at all and if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this -

On my machine, it's OK up to a 400x400 image (starting with all one
colour and filling from the centre with another colour).

At 500x500, I get stack overflow. The 400x400 the maximum recursion
depth is 80,000 calls.

I don't an alternative ATM, I'm just reporting what I saw with my test
program shown below, since some here don't believe that recursion can be problematical.

--------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned char byte;

enum {dimx=400};
enum {dimy=dimx};
byte image[dimx][dimy];
int maxdepth;

byte getpixel(int x, int y) {
return image[x][y];
}

void setpixel(int x, int y, byte newcol) {
image[x][y]=newcol;
}

int onscreen(int x, int y) {
return x>=0 && x<dimx && y>=0 && y<dimy;
}

void fill(int x, int y, unsigned old_color, unsigned new_color)
{
if(old_color == new_color) return;
static int depth=0;

++depth;
if (depth>maxdepth) maxdepth=depth;

if(onscreen( x, y)) {
//printf("FILL %d %d %d depth:%d\n",x,y, onscreen(x,y), depth);
if(getpixel(x,y)==old_color)
{
setpixel(x,y,new_color);
fill(x+1, y, old_color, new_color);
fill(x-1, y, old_color, new_color);
fill(x, y-1, old_color, new_color);
fill(x, y+1, old_color, new_color);
--depth;
return;
}
}
--depth;
return;
}

static void writepgm(byte* file) {
int x, y;
void* f;
f = fopen(file,"w");
fprintf(f,"%s\n","P2");
fprintf(f,"%d %d\n",dimx,dimy);
fprintf(f,"255\n");
for (y=0; y<dimy; ++y) {
for (x=0; x<dimx; ++x) {
fprintf(f,"%u%s",image[y][x]," ");
}
fprintf(f,"\n");
}
fclose(f);
}

int main(void) {

fill(dimx/2, dimy/2, 0, 80);

printf("maxdepth=%d\n",maxdepth);
puts("");

puts("Writing test.ppm:");
writepgm("test.ppm");

}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Sat Mar 16 18:25:37 2024

David Brown <[email protected]> writes:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern. But not
sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c >>

Your implementation is a mess, /vastly/ more difficult to prove correct

Malcolm can't even spell 'integer' correctly in that code blob :-).

Certainly the intent of Fir's algorithm is easily discerned from
his code. I can't say that about Malcolms.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Malcolm McLean on Sat Mar 16 18:21:38 2024

Malcolm McLean <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

Example given. A recursive algorithm which is hard to reason about and

Perhaps hard for _you_ to reason about. That doesn't
generalize to every other programmer that might read that
code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to bart on Sat Mar 16 20:23:57 2024

On 16/03/2024 19:13, bart wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom for
my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to
do it by hand pixel by pixel and the need to change color of given
element is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on
adjacent 4 pixels (only need check if it is in screen at all and if
the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
   if(old_color == new_color) return 0;

   if(XYIsInScreen( x, y))
   if(GetPixelUnsafe(x,y)==old_color)
   {
     SetPixelSafe(x,y,new_color);
     RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
     RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
     RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
     RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
     return 1;
   }

   return 0;
}

it work but im not quite sure how to estimate the safety of this -

On my machine, it's OK up to a 400x400 image (starting with all one
colour and filling from the centre with another colour).

At 500x500, I get stack overflow. The 400x400 the maximum recursion
depth is 80,000 calls.

For an NxN image filling from the centre, the max depth is N*N/2, or
from one corner, it's N*N.

The depth with an N*1 image starting from one end seems to just N.

It appears to fill as much as possible (in my tests, all remaining
pixels), before returning from any call, at which point, the work is done.

I've just looked in my Computer Graphics Principles and Practice book
(after blowing off the dust), and the algorithm above is exactly the 'FloodFill4' one in the book. It mentions the problems with the stack;
maybe I should have looked in there first.

It talks about better approaches, but it doesn't give a better algorithm
that I can see. Perhaps the OP should just do an online search for one.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to bart on Sun Mar 17 10:42:37 2024

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

No, which is why I did not make such an assertion.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Scott Lurndal on Sun Mar 17 10:31:18 2024

[email protected] (Scott Lurndal) writes:

David Brown <[email protected]> writes:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern. But not >>> sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove correct

Malcolm can't even spell 'integer' correctly in that code blob :-).

As someone with dyslexia I have never liked mocking remarks about
spelling errors. Using "even" suggests that a superficial issue hints
at deeper problems. This is rarely the case.

However, I /would/ urge Malcolm to correct the spelling if Bresenham
since the intent was clearly to credit the discoverer. Also,
misspellings don't play well with library databases.

Certainly the intent of Fir's algorithm is easily discerned from
his code. I can't say that about Malcolms.

I have some reservations about the code, but he posted a link so there
is no indication that he wants a review of it.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Sun Mar 17 11:25:00 2024

Malcolm McLean <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

Example given. A recursive algorithm which is hard to reason about and
prove correct, because we don't really know whether under perfectly reasonable assumptions it will or will not blow the stack.

Had you offered a proof that your code neither "blows the stack" nor
runs out of any other resource we'd have a starting point for
comparison, but you have not done that.

Mind you, had you done that, we would have something that might
eventually become only one piece of evidence for what is an
astonishingly general remark. Broadly applicable remarks require either broadly applicable evidence or a wealth of distinct cases.

Your "rule" suggests that all reasoning is impeded by the presence of
recursion and I don't think you can support that claim. This is
characteristic of many of your remarks -- they are general "rules" that
often remain rules even when there is evidence to the contrary.

I'll make another point in the hope of clarifying the matter. An
algorithm or code is usually proved correct (or not!) under the
assumption that it has adequate resources -- usually time and storage.
Further reasoning may then be done to determine the resource
requirements since this is so often dependent on context. This
separation is helpful as you don't usually want to tie "correctness" to
some specific installation. The code might run on a system with a
dynamically allocated stack, for example, that has very similar
limitations to "heap" memory.

To put is more generally, we often want to prove properties of code that
are independent of physical constraints. Your remark includes this kind reasoning. Did you intend it to?

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Sun Mar 17 14:46:25 2024

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom
for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area
of of one color into another color (becouse if no someone would
need to do it by hand pixel by pixel and the need to change color
of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same function
on adjacent 4 pixels (only need check if it is in screen at all and
if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this - incidentally as i said i use this editor to low res graphics like 200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but
still

is there maybe simple way to improve it?

This is a cheap and cheerful fllod fill. And it's easy to get right
and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through diagonal?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Malcolm McLean on Sun Mar 17 12:49:33 2024

On 17/03/2024 12:28, Malcolm McLean wrote:

On 17/03/2024 10:31, Ben Bacarisse wrote:

[email protected] (Scott Lurndal) writes:

David Brown <[email protected]> writes:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern.
But not
sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove correct >>>

Malcolm can't even spell 'integer' correctly in that code blob :-).

As someone with dyslexia I have never liked mocking remarks about
spelling errors. Using "even" suggests that a superficial issue hints
at deeper problems. This is rarely the case.

However, I /would/ urge Malcolm to correct the spelling if Bresenham
since the intent was clearly to credit the discoverer. Also,
misspellings don't play well with library databases.

Certainly the intent of Fir's algorithm is easily discerned from
his code. I can't say that about Malcolms.

I have some reservations about the code, but he posted a link so there
is no indication that he wants a review of it.

Tbe main intent was to help fir. That algorithm does tend to blow the
stack though of course it depends on the image. However worst case is a pattern which is pixel wide line, e.g. a spiral or a maze or a series of alterante light and dark bands with a lirtel gaps at each end. And you achieve that by filling half the pixels. So foe a 100x100 image your
worst case is 10,000 = 5,000 recursive calls, and the stack is blown.

I'd been planning to create a square spiral. But I found I got N*N/2
behaviour even with a blank image which was filled in from one corner.

After thinking about it, it became obvious that the potential call depth
wasn't the distance to a boundary in any direction, but the number of
pixels in the area to be eventually filled in, which could be a big
chunk of the N*N total.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Mar 17 12:54:34 2024

On 17/03/2024 12:46, Michael S wrote:

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom
for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area
of of one color into another color (becouse if no someone would
need to do it by hand pixel by pixel and the need to change color
of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same function
on adjacent 4 pixels (only need check if it is in screen at all and
if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color); >>> RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color); >>> RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color); >>> RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color); >>> return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this -
incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but
still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get right
and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through diagonal?

Suppose you have an image which is a chessboard. You want to fill one of
the black squares so that it is red.

If you allow connectivity through the diagonals (so two notionally
square pixels that only meet at their corners would be connected), then
all the black squares would turn red, not just one.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter 'Shaggy' Haywood@21:1/5 to All on Sun Mar 17 18:03:57 2024

Groovy hepcat fir was jivin' in comp.lang.c on Sat, 16 Mar 2024 03:11
pm. It's a cool scene! Dig it.

i was writing simple editor (something like paint but more custom for
my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to
do
it by hand pixel by pixel and the need to change color of given
element is very common)

Not really a C question, but I'll forgive that for now.
What you're looking for (and can easily find on Google, Duck Duck Go
or any other search engine, if you but utilise any of those services)
is called a "flood fill" algorithm.
But a word of advice: recursion can be tricky if you don't understand
the effect. Your method creates a very large recursive chain. This is
best avoided. Try it out "by hand". Get a piece of graph paper and draw
some shapes on it, including some complex ones. Now choose one of these
shapes and choose a starting pixel within this area and try applying
your algorithm. With a coloured pencil, colour in each square as you
go, just as the algorithm would. Also make note of the level of
recursion as you go. I think you'll be amazed. Repeat for all the
shapes on your graph paper.

--

----- Dig the NEW and IMPROVED news sig!! -----

-------------- Shaggy was here! ---------------
Ain't I'm a dawg!!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Sun Mar 17 15:32:03 2024

On Sat, 16 Mar 2024 18:21:38 GMT
[email protected] (Scott Lurndal) wrote:

Malcolm McLean <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

Example given. A recursive algorithm which is hard to reason about
and

Perhaps hard for _you_ to reason about. That doesn't
generalize to every other programmer that might read that
code.

As a matter of fact, David Brown was not able to reason about depth of recursion in fir's code. And you answered David's post without spotting
his mistake.
Now, I don't know if you didn't spot his mistake because you didn't read
this part of his message or because for you too it was hard to reason
about depth of recursion.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun Mar 17 15:15:19 2024

On Sun, 17 Mar 2024 12:54:34 +0000
bart <[email protected]> wrote:

On 17/03/2024 12:46, Michael S wrote:

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom
for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed
area of of one color into another color (becouse if no someone
would need to do it by hand pixel by pixel and the need to
change color of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same function
on adjacent 4 pixels (only need check if it is in screen at all
and if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x-1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y-1, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y+1, old_color,
new_color); return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this
- incidentally as i said i use this editor to low res graphics
like 200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but
still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get right
and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through
diagonal?

Suppose you have an image which is a chessboard. You want to fill one
of the black squares so that it is red.

If you allow connectivity through the diagonals (so two notionally
square pixels that only meet at their corners would be connected),
then all the black squares would turn red, not just one.

That's what I want.
Do fir wants something else?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Mar 17 13:23:55 2024

On 17/03/2024 13:15, Michael S wrote:

On Sun, 17 Mar 2024 12:54:34 +0000
bart <[email protected]> wrote:

On 17/03/2024 12:46, Michael S wrote:

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom
for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed
area of of one color into another color (becouse if no someone
would need to do it by hand pixel by pixel and the need to
change color of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same function
on adjacent 4 pixels (only need check if it is in screen at all
and if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x-1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y-1, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y+1, old_color,
new_color); return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this
- incidentally as i said i use this editor to low res graphics
like 200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but
still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get right
and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through
diagonal?

Suppose you have an image which is a chessboard. You want to fill one
of the black squares so that it is red.

If you allow connectivity through the diagonals (so two notionally
square pixels that only meet at their corners would be connected),
then all the black squares would turn red, not just one.

That's what I want.
Do fir wants something else?

His algorithm is the same as that presented in my textbook, where it is
called FloodFill4.

If I reread the notes I see now the significance of the '4', as it talks
about 4-connected and 8-connected versions.

Presumably you want the 8-connected version, which will have 4 extra
calls for the pixels at each corner.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun Mar 17 15:37:42 2024

On Sun, 17 Mar 2024 13:23:55 +0000
bart <[email protected]> wrote:

On 17/03/2024 13:15, Michael S wrote:

On Sun, 17 Mar 2024 12:54:34 +0000
bart <[email protected]> wrote:

On 17/03/2024 12:46, Michael S wrote:

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more
custom for my eventual needs) for big pixel (low resolution)
drawing

it showed in a minute i need a click for changing given drawed
area of of one color into another color (becouse if no someone
would need to do it by hand pixel by pixel and the need to
change color of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same
function on adjacent 4 pixels (only need check if it is in
screen at all and if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x-1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y-1, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y+1, old_color,
new_color); return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this
- incidentally as i said i use this editor to low res graphics
like 200x200 pixels or less, and it is only a toll of private
use, yet i got no time to work on it more than 1-2-3 days i
guess but still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get
right and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through
diagonal?

Suppose you have an image which is a chessboard. You want to fill
one of the black squares so that it is red.

If you allow connectivity through the diagonals (so two notionally
square pixels that only meet at their corners would be connected),
then all the black squares would turn red, not just one.

That's what I want.
Do fir wants something else?

His algorithm is the same as that presented in my textbook, where it
is called FloodFill4.

If I reread the notes I see now the significance of the '4', as it
talks about 4-connected and 8-connected versions.

Presumably you want the 8-connected version, which will have 4 extra
calls for the pixels at each corner.

'4' variant does not appear useful for changing colors of drawn shapes,
like lines or circles. Nor would it work for changing color of text
except when font is unusually bold.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to bart on Sun Mar 17 14:10:15 2024

bart <[email protected]> writes:

His algorithm is the same as that presented in my textbook, where it is called FloodFill4.

s/my/his/?

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lew Pitcher@21:1/5 to Malcolm McLean on Sun Mar 17 14:27:53 2024

On Sat, 16 Mar 2024 11:33:20 +0000, Malcolm McLean wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom for my
eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to do
it by hand pixel by pixel and the need to change color of given element
is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on adjacent
4 pixels (only need check if it is in screen at all and if the color to
change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
    SetPixelSafe(x,y,new_color);
    RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
    RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
    return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this -
incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but still

is there maybe simple way to improve it?

This is a cheap and cheerful fllod fill. And it's easy to get right and shouldn't afall over. But but makes an awful not of unnecessary calls,
and on a small system and large image might even blow the stack.

Recursion make programs harder to reason about and prove correct.

I would have said that those unfamiliar with the concept of recursion
have a harder time reasoning about the effects of recursion, or proving
their recursive code correct.

Take fir's example code above; a simple single call to RecolorizePixelAndAdjacentOnes() will effectively recolour the
origin cell multiple times, because of how the recursion is handled.

As an example:
RecolorizePixelAndAdjacentOnes(0,0,1 2)
will
SetPixelSafe(0,0,2);
then invoke
RecolorizePixelAndAdjacentOnes(1,0,1 2)
which will
SetPixelSafe(1,0,2)
and subsequently invoke
...
RecolorizePixelAndAdjacentOnes(0,0,1 2)
which will
SetPixelSafe(0,0,2);
and then invoke
RecolorizePixelAndAdjacentOnes(1,0,1 2)
etc.

[snip]

--
Lew Pitcher
"In Skills We Trust"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lew Pitcher on Sun Mar 17 15:13:18 2024

On 17/03/2024 14:27, Lew Pitcher wrote:

On Sat, 16 Mar 2024 11:33:20 +0000, Malcolm McLean wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom for my >>> eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to do
it by hand pixel by pixel and the need to change color of given element >>> is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on adjacent >>> 4 pixels (only need check if it is in screen at all and if the color to
change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color); >>> RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color); >>> RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color); >>> RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color); >>> return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this -
incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get right and
shouldn't afall over. But but makes an awful not of unnecessary calls,
and on a small system and large image might even blow the stack.

Recursion make programs harder to reason about and prove correct.

I would have said that those unfamiliar with the concept of recursion
have a harder time reasoning about the effects of recursion, or proving
their recursive code correct.

Take fir's example code above; a simple single call to RecolorizePixelAndAdjacentOnes() will effectively recolour the
origin cell multiple times, because of how the recursion is handled.

I don't think so. It may look at the original cell, but it will only
recolour it (and recursively process its neighbours) if the colour
hasn't yet changed to the new one.

If I take a 100x100 image with 10,000 cells, which all have to be filled
to the new colour, then SetPixelSafe is called exactly 10,000 times.

The problem is that most of the work is done along a 10,000-deep chain
of nested calls.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Sun Mar 17 17:42:18 2024

On Sun, 17 Mar 2024 14:56:34 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 15:09, Malcolm McLean wrote:

On 16/03/2024 14:40, David Brown wrote:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern.
But not sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove
correct than the OP's original one, and unlikely to be very much
faster (it will certainly scale in the same way in both time and
memory usage).

Now is this David Brown being David Borwn, ot its it actaully ture?

And I need to run some tests, don't I?

Let's give it a whirl

<snip>

malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c malcolm@Malcolms-iMac cscratch % ./a.out
floodfill_r 1.69274
floodfill4 0.336705

Now try the case in which original recursion is particularly deep.
Something like that:
*.***.**
*.*.*.*.
*.*.*.*.
*.*.*.*.
*.*.*.*.
*.*.*.*.
*.*.*.*.
***.***.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Sun Mar 17 17:45:16 2024

On 16/03/2024 16:09, Malcolm McLean wrote:

On 16/03/2024 14:40, David Brown wrote:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern. But
not sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove
correct than the OP's original one, and unlikely to be very much
faster (it will certainly scale in the same way in both time and
memory usage).

Now is this David Brown being David Borwn, ot its it actaully ture?

I don't know who "David Borwn" might be, nor what "ture" means. If you
can't type, and can't spell, then at least pay the group the respect of
using a spell-checker.

It's not designed to be eay to prove correct, that's true. And the
maintain it's mess is that we are managing the queue manually for speed.

It is badly designed code. It is a jumble of wildly different concepts,
thrown together in one huge function with no structure or organisation,
and with meaningless names for the variables and absurd names for the parameters.

The OP's code is simple and obvious, as is its correctness (assuming
reasonable definitions of the pixel access and setting functions) and
its time and space requirements. Yours is not.

Your algorithm could be used in a proper implementation, with separate functions to handle the different parts (such as the stack). The
algorithm itself is not bad, it's the implementation that is the main
problem.

But the naive recursive algorithm is O(N) (N = pixels to flood), and inherently we can't beat that without special hardware.

Assuming you are measuring the number of pixels read or written here,
then that is, I think, correct.

The recursive
one tends to be slow because calls are expensive.

Yes, I agree that recursion can be slow (unless it is simple enough for
the compiler to turn it into a loop). And it typically takes more stack
space than you'd need for a dedicated queue. But whether or not that
makes a significant difference depends on the code in question, and how
much work you are doing within the code. If step of the algorithm takes
a lot of time anyway, the call overhead will be of less relevance.

I would expect that your code would be several times faster than the
OP's, with similar scaling. But the OP's is understandable and easily
seen to be correct, unlike yours, and correctness trumps speed every time.

And starting from a correct recursive version, it's possible to improve
on it in many ways while retaining correctness.

And mine makes calls
to malloc() and realloc to manage the queue. And of course whilst we
might blow the stack, we are much less likely to run out of heap.

True.

And it's been tweaked abit in hacky way to make it faster on real
images. And whilst it's still going to work, is it out of date?

I have no idea if your code is "out of date" or not. It seems to be
written for images consisting of unsigned chars, so I a not sure it was
ever designed for real-world images.

What is clear is that you have taken an okay algorithm - not state of
the art, but not the worst - and made a dog's breakfast of an
implementation in your attempts to micro-optimise. This means you have
code that can't be easily analysed or seen to be correct, cannot be
improved algorithmically, and cannot be expanded or gain new features
without a massive re-write.

And I need to run some tests, don't I?

If you like.

There are a variety of different flood-fill algorithms, with different
advantages and disadvantages. Speeds will often depend as much on the
way the get/set pixel code works, especially if the flood-fill is on
live displayed data rather than in a buffer off-screen. But typically
you need to get a /lot/ more advanced (i.e., not your algorithm) to
improve on the OP's version by an order of magnitude, so if speed is
not essential but understanding that it is correct is important, then
it makes more sense to stick to the original recursive version.

What are these / lot / more advanced algorithms? Maybe they exist. But
don't people deserve some sort of link?

<https://gprivate.com/6a2yp>

I don't know if it is fair to call them a /lot/ more advanced, but
certainly a bit more advanced. And certainly better implementations are possible.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Ben Bacarisse on Sun Mar 17 17:59:39 2024

On 17/03/2024 11:31, Ben Bacarisse wrote:

[email protected] (Scott Lurndal) writes:

David Brown <[email protected]> writes:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern. But not >>>> sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove correct

Malcolm can't even spell 'integer' correctly in that code blob :-).

As someone with dyslexia I have never liked mocking remarks about
spelling errors. Using "even" suggests that a superficial issue hints
at deeper problems. This is rarely the case.

However, I /would/ urge Malcolm to correct the spelling if Bresenham
since the intent was clearly to credit the discoverer. Also,
misspellings don't play well with library databases.

I also have dyslexia. I am dependent on a spell checker to spell
accurately. And that is one of the reasons why Malcolm should do a
better job of writing accurately - it is much easier for others to read
posts when the spelling and the grammar is correct. I would, of course,
also like him to do a better job at his grammar, but using a
spell-checker is so simple and low-cost that it is inexcusable for him
not to use one.

I have nothing bad to say about people who can't spell well, or who
can't type well, and I have nothing but respect for people who are
trying their best to write in a second (or third, or more) language.

But Malcolm is a native English speaker with a degree in English. He is
not dyslexic (or at least, the mistakes he makes are not typical signs
of dyslexia). He is simply a poor typist and too lazy to make an effort
to correct his errors.

Certainly the intent of Fir's algorithm is easily discerned from
his code. I can't say that about Malcolms.

I have some reservations about the code, but he posted a link so there
is no indication that he wants a review of it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Sun Mar 17 18:05:22 2024

On 17/03/2024 14:37, Michael S wrote:

On Sun, 17 Mar 2024 13:23:55 +0000
bart <[email protected]> wrote:

On 17/03/2024 13:15, Michael S wrote:

On Sun, 17 Mar 2024 12:54:34 +0000
bart <[email protected]> wrote:

On 17/03/2024 12:46, Michael S wrote:

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more
custom for my eventual needs) for big pixel (low resolution)
drawing

it showed in a minute i need a click for changing given drawed
area of of one color into another color (becouse if no someone
would need to do it by hand pixel by pixel and the need to
change color of given element is very common)

there is very simple method of doing it - i men i click in given >>>>>>> color pixel then replace it by my color and call the same
function on adjacent 4 pixels (only need check if it is in
screen at all and if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x-1, y, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y-1, old_color,
new_color); RecolorizePixelAndAdjacentOnes(x, y+1, old_color,
new_color); return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this >>>>>>> - incidentally as i said i use this editor to low res graphics
like 200x200 pixels or less, and it is only a toll of private
use, yet i got no time to work on it more than 1-2-3 days i
guess but still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get
right and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through
diagonal?

Suppose you have an image which is a chessboard. You want to fill
one of the black squares so that it is red.

If you allow connectivity through the diagonals (so two notionally
square pixels that only meet at their corners would be connected),
then all the black squares would turn red, not just one.

That's what I want.
Do fir wants something else?

His algorithm is the same as that presented in my textbook, where it
is called FloodFill4.

If I reread the notes I see now the significance of the '4', as it
talks about 4-connected and 8-connected versions.

Presumably you want the 8-connected version, which will have 4 extra
calls for the pixels at each corner.

'4' variant does not appear useful for changing colors of drawn shapes,
like lines or circles. Nor would it work for changing color of text
except when font is unusually bold.

An 8-connected flood fill is typically too much, and a 4-connected flood
fill is often too little. Neither is perfect for all cases, but I think
the 4-connected version is the most commonly used.

And as they stand, both are useless for images that are come from
realistic pictures (as distinct from drawings), since real colours
change gradually.

That's why graphics programs have feathered selection and masking, fuzzy
fills, and so on.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Malcolm McLean on Sun Mar 17 17:11:44 2024

On 2024-03-17, Malcolm McLean <[email protected]> wrote:

The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited whilst stacks are not.

The strawman, absolutist conventional wisdom that "recursion is always
easier to analyze than iteration" is wrong in the first place.

Any program graph based on nothing but IF and GOTO primitives can be mechanically transliterated into a (tail) recursive structure that has
the same shape, and is no easier to understand.

Your point is not very well made, though. Even though recursion may run
into a resource limit, its structure can still help analyze the logic of
the algorithm apart from that resource issue. The resource issue can be separately analyzed and provisions made for the algorithm to handle the required inputs, and reject others.

Most algorithms (especially ones working with all inputs in memory)
are constrained by resources. The iterative version of that image
processing algorithm might handle larger images than the recursive
one, but there are yet image sizes it won't handle.

The idea of calling algorithm implementations "incorrect" if they have
any limitations on their input sizes and such isn't particularly
informative or useful.

Obviously it is incorrect if something has limitations, and is used
in such a way that they are exceeded. E.g. the C <int> + <int>
operation when a result is implied that is beyond INT_MIN or INT_MAX.
Oops, + is not "correct"; don't use it!

Now, there is a bit of value in algorithms that will successfully
operate on any object that was successfully fit into memory. Do
these really exist though? Pretty much any algorithm implementation
requires some space to do its work, even if that space is small and
fixed. It's possible that the input fit into memory, yet that small and
fixed amount of space is not available.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Sun Mar 17 18:25:20 2024

On Sun, 17 Mar 2024 14:56:34 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 15:09, Malcolm McLean wrote:

On 16/03/2024 14:40, David Brown wrote:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern.
But not sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove
correct than the OP's original one, and unlikely to be very much
faster (it will certainly scale in the same way in both time and
memory usage).

Now is this David Brown being David Borwn, ot its it actaully ture?

And I need to run some tests, don't I?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int floodfill_r(unsigned char *grey, int width, int height, int x,
int y, unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
grey[y*width+x] = dest;
floodfill_r(grey, width, height, x - 1, y, target, dest);
floodfill_r(grey, width, height, x + 1, y, target, dest);
floodfill_r(grey, width, height, x, y - 1, target, dest);
floodfill_r(grey, width, height, x, y + 1, target, dest);

return 0;
}

/**
Floodfill4 - floodfill, 4 connectivity.

@param[in,out] grey - the image (formally it's greyscale but it
could be binary or indexed)
@param width - image width
@param height - image height
@param x - seed point x
@param y - seed point y
@param target - the colour to flood
@param dest - the colur to replace it by.
@returns Number of pixels flooded.
*/
int floodfill4(unsigned char *grey, int width, int height, int x, int
y, unsigned char target, unsigned char dest)
{
int *qx = 0;
int *qy = 0;
int qN = 0;
int qpos = 0;
int qcapacity = 0;
int wx, wy;
int ex, ey;
int tx, ty;
int ix;
int *temp;
int answer = 0;

if(grey[y * width + x] != target)
return 0;
qx = malloc(width * sizeof(int));
qy = malloc(width * sizeof(int));
if(qx == 0 || qy == 0)
goto error_exit;
qcapacity = width;
qx[qpos] = x;
qy[qpos] = y;
qN = 1;

while(qN != 0)
{
tx = qx[qpos];
ty = qy[qpos];
qpos++;
qN--;

if(qpos == 256)
{
memmove(qx, qx + 256, qN*sizeof(int));
memmove(qy, qy + 256, qN*sizeof(int));
qpos = 0;
}
if(grey[ty*width+tx] != target)
continue;
wx = tx;
wy = ty;
while(wx >= 0 && grey[wy*width+wx] == target)
wx--;
wx++;
ex = tx;
ey = ty;
while(ex < width && grey[ey*width+ex] == target)
ex++;
ex--;

for(ix=wx;ix<=ex;ix++)
{
grey[ty*width+ix] = dest;
answer++;
}

if(ty > 0)
for(ix=wx;ix<=ex;ix++)
{
if(grey[(ty-1)*width+ix] == target)
{
if(qpos + qN == qcapacity)
{
temp = realloc(qx, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qx = temp;
temp = realloc(qy, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qy = temp;
qcapacity += width;
}
qx[qpos+qN] = ix;
qy[qpos+qN] = ty-1;
qN++;
}
}
if(ty < height -1)
for(ix=wx;ix<=ex;ix++)
{
if(grey[(ty+1)*width+ix] == target)
{
if(qpos + qN == qcapacity)
{
temp = realloc(qx, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qx = temp;
temp = realloc(qy, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qy = temp;
qcapacity += width;
}
qx[qpos+qN] = ix;
qy[qpos+qN] = ty+1;
qN++;
}
}
}

free(qx);
free(qy);

return answer;
error_exit:
free(qx);
free(qy);
return -1;
}

int main(void)
{
unsigned char *image;
clock_t tick, tock;
int i;

image = malloc(100 * 100);
tick = clock();
for (i = 0 ; i < 10000; i++)
{
memset(image, 0, 100 * 100);
floodfill_r(image, 100, 100, 50, 50, 0, 1);
}
tock = clock();
printf("floodfill_r %g\n", ((double)(tock -
tick))/CLOCKS_PER_SEC);

tick = clock();
for (i = 0 ; i < 10000; i++)
{
memset(image, 0, 100 * 100);
floodfill4(image, 100, 100, 50, 50, 0, 1);
}
tock = clock();
printf("floodfill4 %g\n", ((double)(tock - tick))/CLOCKS_PER_SEC);

return 0;
}

Let's give it a whirl

malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c malcolm@Malcolms-iMac cscratch % ./a.out
floodfill_r 1.69274
floodfill4 0.336705

I find your performance measurement non-decisive for two reasons:
(1) because your test case is too trivial and probably uncharacteristic
and
(2) because recursive variant could be trivially rewritten in a way
that reduces # of stack memory accesses by factor of 2 or 3.
Like that:

struct recursive_context_t {
unsigned char *grey;
int width, height;
unsigned char target, dest;
};

static void floodfill_r_core(const struct recursive_context_t* context,
int x, int y) {
if (x < 0 || x >= context->width || y < 0 || y >= context->height)
return;
if (context->grey[y*context->width+x] == context->target) {
context->grey[y*context->width+x] = context->dest;
floodfill_r_core(context, x - 1, y);
floodfill_r_core(context, x + 1, y);
floodfill_r_core(context, x, y - 1);
floodfill_r_core(context, x, y + 1);
}
}

int floodfill_r(
unsigned char *grey,
int width, int height,
int x, int y,
unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
struct recursive_context_t context = {
.grey = grey,
.width = width,
.height = height,
.target = target,
.dest = dest,
};
floodfill_r_core(&context, x, y);
return 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Sun Mar 17 19:39:08 2024

On Sun, 17 Mar 2024 18:25:20 +0200
Michael S <[email protected]> wrote:

On Sun, 17 Mar 2024 14:56:34 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 15:09, Malcolm McLean wrote:

On 16/03/2024 14:40, David Brown wrote:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern.
But not sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove
correct than the OP's original one, and unlikely to be very much
faster (it will certainly scale in the same way in both time and
memory usage).

Now is this David Brown being David Borwn, ot its it actaully
ture?

And I need to run some tests, don't I?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int floodfill_r(unsigned char *grey, int width, int height, int x,
int y, unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
grey[y*width+x] = dest;
floodfill_r(grey, width, height, x - 1, y, target, dest);
floodfill_r(grey, width, height, x + 1, y, target, dest);
floodfill_r(grey, width, height, x, y - 1, target, dest);
floodfill_r(grey, width, height, x, y + 1, target, dest);

return 0;
}

/**
Floodfill4 - floodfill, 4 connectivity.

@param[in,out] grey - the image (formally it's greyscale but it
could be binary or indexed)
@param width - image width
@param height - image height
@param x - seed point x
@param y - seed point y
@param target - the colour to flood
@param dest - the colur to replace it by.
@returns Number of pixels flooded.
*/
int floodfill4(unsigned char *grey, int width, int height, int x,
int y, unsigned char target, unsigned char dest)
{
int *qx = 0;
int *qy = 0;
int qN = 0;
int qpos = 0;
int qcapacity = 0;
int wx, wy;
int ex, ey;
int tx, ty;
int ix;
int *temp;
int answer = 0;

if(grey[y * width + x] != target)
return 0;
qx = malloc(width * sizeof(int));
qy = malloc(width * sizeof(int));
if(qx == 0 || qy == 0)
goto error_exit;
qcapacity = width;
qx[qpos] = x;
qy[qpos] = y;
qN = 1;

while(qN != 0)
{
tx = qx[qpos];
ty = qy[qpos];
qpos++;
qN--;

if(qpos == 256)
{
memmove(qx, qx + 256, qN*sizeof(int));
memmove(qy, qy + 256, qN*sizeof(int));
qpos = 0;
}
if(grey[ty*width+tx] != target)
continue;
wx = tx;
wy = ty;
while(wx >= 0 && grey[wy*width+wx] == target)
wx--;
wx++;
ex = tx;
ey = ty;
while(ex < width && grey[ey*width+ex] == target)
ex++;
ex--;

for(ix=wx;ix<=ex;ix++)
{
grey[ty*width+ix] = dest;
answer++;
}

if(ty > 0)
for(ix=wx;ix<=ex;ix++)
{
if(grey[(ty-1)*width+ix] == target)
{
if(qpos + qN == qcapacity)
{
temp = realloc(qx, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qx = temp;
temp = realloc(qy, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qy = temp;
qcapacity += width;
}
qx[qpos+qN] = ix;
qy[qpos+qN] = ty-1;
qN++;
}
}
if(ty < height -1)
for(ix=wx;ix<=ex;ix++)
{
if(grey[(ty+1)*width+ix] == target)
{
if(qpos + qN == qcapacity)
{
temp = realloc(qx, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qx = temp;
temp = realloc(qy, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qy = temp;
qcapacity += width;
}
qx[qpos+qN] = ix;
qy[qpos+qN] = ty+1;
qN++;
}
}
}

free(qx);
free(qy);

return answer;
error_exit:
free(qx);
free(qy);
return -1;
}

int main(void)
{
unsigned char *image;
clock_t tick, tock;
int i;

image = malloc(100 * 100);
tick = clock();
for (i = 0 ; i < 10000; i++)
{
memset(image, 0, 100 * 100);
floodfill_r(image, 100, 100, 50, 50, 0, 1);
}
tock = clock();
printf("floodfill_r %g\n", ((double)(tock -
tick))/CLOCKS_PER_SEC);

tick = clock();
for (i = 0 ; i < 10000; i++)
{
memset(image, 0, 100 * 100);
floodfill4(image, 100, 100, 50, 50, 0, 1);
}
tock = clock();
printf("floodfill4 %g\n", ((double)(tock -
tick))/CLOCKS_PER_SEC);

return 0;
}

Let's give it a whirl

malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c malcolm@Malcolms-iMac cscratch % ./a.out
floodfill_r 1.69274
floodfill4 0.336705

I find your performance measurement non-decisive for two reasons:
(1) because your test case is too trivial and probably
uncharacteristic and
(2) because recursive variant could be trivially rewritten in a way
that reduces # of stack memory accesses by factor of 2 or 3.
Like that:

struct recursive_context_t {
unsigned char *grey;
int width, height;
unsigned char target, dest;
};

static void floodfill_r_core(const struct recursive_context_t*
context, int x, int y) {
if (x < 0 || x >= context->width || y < 0 || y >= context->height)
return;
if (context->grey[y*context->width+x] == context->target) {
context->grey[y*context->width+x] = context->dest;
floodfill_r_core(context, x - 1, y);
floodfill_r_core(context, x + 1, y);
floodfill_r_core(context, x, y - 1);
floodfill_r_core(context, x, y + 1);
}
}

int floodfill_r(
unsigned char *grey,
int width, int height,
int x, int y,
unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
struct recursive_context_t context = {
.grey = grey,
.width = width,
.height = height,
.target = target,
.dest = dest,
};
floodfill_r_core(&context, x, y);
return 1;
}

I did my own measurements with snake-like image from my first
response to Malcolm. For this shape, recursive version (after my
improvement) is almost exactly 10 times slower than Malcolm's iterative
code. And suspect to stack overflow although a little less so than
original.
Even if in Big Oh sense they are the same, it does look like Malcolm's
variant is decisively faster in practice.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Spiros Bousbouras on Sun Mar 17 22:14:04 2024

Spiros Bousbouras <[email protected]> writes:

On Sun, 17 Mar 2024 14:10:15 +0000
Ben Bacarisse <[email protected]> wrote:

bart <[email protected]> writes:

His algorithm is the same as that presented in my textbook, where it is
called FloodFill4.

s/my/his/?

What is mentioned in <ut4v4r$32mgb$[email protected]> : "I've just looked in my Computer Graphics Principles and Practice book" .

That context seems to have got lost, and MM was quoting from his
textbook (or book at any rate). Thanks for pointing out the missing
context.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Ben Bacarisse on Sun Mar 17 22:21:28 2024

On 17/03/2024 22:14, Ben Bacarisse wrote:

Spiros Bousbouras <[email protected]> writes:

On Sun, 17 Mar 2024 14:10:15 +0000
Ben Bacarisse <[email protected]> wrote:

bart <[email protected]> writes:

His algorithm is the same as that presented in my textbook, where it is >>>> called FloodFill4.

s/my/his/?

What is mentioned in <ut4v4r$32mgb$[email protected]> : "I've just looked in >> my Computer Graphics Principles and Practice book" .

That context seems to have got lost, and MM was quoting from his
textbook (or book at any rate). Thanks for pointing out the missing
context.

'His' algorithm was the one in the OP posted by fir.

'My' (bart's) textbook was the CGPP book by Foley, van Dam, et al.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Mon Mar 18 07:58:44 2024

On 17/03/2024 19:28, Malcolm McLean wrote:

On 17/03/2024 16:45, David Brown wrote:

On 16/03/2024 16:09, Malcolm McLean wrote:

The OP's code is simple and obvious, as is its correctness (assuming
reasonable definitions of the pixel access and setting functions) and
its time and space requirements. Yours is not.

Except it is not. You didn't give the right answer for the space requirements.

Unfortunately, I am still fallible - /easier/ does not mean I'll get it
right :-( And I apologise for unhelpfully rushing that and getting it
wrong.

However, I stand by my claim that the recursive version is much easier
to analyse.

Your algorithm could be used in a proper implementation, with separate
functions to handle the different parts (such as the stack). The
algorithm itself is not bad, it's the implementation that is the main
problem.

It's better to have one function. Subroutines have a way of getting lost.>

Seriously? "Subroutines get lost" ? So your answer is to put all your
ideas in a mixer and scrunch them up until any semblance of logic and
structure is lost, and the code works (if it does) by trial and error?
And then the whole mess is cut-and-paste duplicated - along with any
subtle bugs it might have - for 8-connected version. And that's better,
in your eyes, than re-using code?

I have no idea if your code is "out of date" or not. It seems to be
written for images consisting of unsigned chars, so I a not sure it
was ever designed for real-world images.

It was written a long time ago. But it is writeen in a conservative
subset of ANSI C, and so of course it still works, and should work for
along time to come. But the 256 integer queue tweak might be out of
date. And cache use is far more important now that it was on big
processors. So it might be a bit long in the tooth.

I have been most interested in being able to be sure the algorithm is
correct, rather than considering its absolute (rather than "big O")
efficiency in different systems. It is certainly the case that cache considerations are more relevant now than they used to be on many
systems. And for working on PC's, you would likely dispense with your
growing stack entirely and simply allocate a queue big enough for every
pixel in the image.

And it's part of the binary image library, and it's designed for marking
8- or 4- connected sections of those images by setting the 1 to a
different value. And then further processing. The binary images are
often derived from photographs by Otsu thresholding, which is in the
same library. But they aren't usually meant for human viewing by end users.

I don't know if it is fair to call them a /lot/ more advanced, but
certainly a bit more advanced. And certainly better implementations
are possible.

And are you going to be constructive or not? Suggest one which might be better? Even implement it?

I suggested separating the code into functions - that is /definitely/ constructive. I suggested using sensible names for parameters and
variables (well, the suggestion was implied by my criticism).

And I am also suggesting now that you allocate a queue that is big
enough for every pixel in the image. Much of what you don't touch of
that space, will probably never be physically allocated by the OS,
depending on page sizes and free memory.

And I would also suggest you drop the requirement for coding in an
ancient tongue, and instead switch to reasonably modern C. Make
abstractions for the types and the access functions - it will make the
code far easier to follow, easier to show correct, and easier to modify
and reuse, without affecting efficiency at run-time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Mon Mar 18 02:30:59 2024

Michael S <[email protected]> writes:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom
for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area
of of one color into another color (becouse if no someone would
need to do it by hand pixel by pixel and the need to change color
of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same function
on adjacent 4 pixels (only need check if it is in screen at all and
if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

[...]

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through diagonal?

It is customary in raster graphics to count pixels as adjacent
only if they share an edge, not if they just share a corner.
Usually that gives better results; the exceptions tend to need
special handling anyway and not just connecting through
diagonals.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to bart on Mon Mar 18 03:00:32 2024

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

The claim is that recursion always makes programs harder to reason
about and prove correct. It's easy to find examples that show
recursion does not always makes programs harder to reason about and
prove correct.

I personally find recursion hard work and errors much harder to
debug.

Most likely that's because you haven't had the relevant background
in learning how to program in a functional style. That matches my
own experience: it was only after learning how to write programs in
a functional style that I really started to appreciate the benefits
of using recursion, and to understand how to write and reason about
recursive programs.

It is also becomes much more important to show that will not cause
stack overflow.

In most cases it's enough to show that the stack depth never exceeds
log N for an input of size N. I use recursion quite routinely
without there being any significant danger of stack overflow. It's
a matter of learning which patterns are safe and which patterns are
potentially dangerous, and avoiding the dangerous patterns (unless
certain guarantees can be made to make them safe again).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to All on Mon Mar 18 03:04:57 2024

bart <[email protected]> writes:

P.S. You deserve credit for pointing out that the worst case
for flood fill is changing the color of the entire pixel
field. Maybe it was obvious to other people but I appreciate
it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Mon Mar 18 14:23:51 2024

On Mon, 18 Mar 2024 03:00:32 -0700
Tim Rentsch <[email protected]> wrote:

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

The claim is that recursion always makes programs harder to reason
about and prove correct. It's easy to find examples that show
recursion does not always makes programs harder to reason about and
prove correct.

I personally find recursion hard work and errors much harder to
debug.

Most likely that's because you haven't had the relevant background
in learning how to program in a functional style. That matches my
own experience: it was only after learning how to write programs in
a functional style that I really started to appreciate the benefits
of using recursion, and to understand how to write and reason about
recursive programs.

It is also becomes much more important to show that will not cause
stack overflow.

In most cases it's enough to show that the stack depth never exceeds
log N for an input of size N. I use recursion quite routinely
without there being any significant danger of stack overflow. It's
a matter of learning which patterns are safe and which patterns are potentially dangerous, and avoiding the dangerous patterns (unless
certain guarantees can be made to make them safe again).

The problem in this case is that max. depth of recursion is O(N) where N
is total number of pixels to change color. So far I didn't find an
obvious way to cut the worst case by more than small factor without
turning recursive algorithm into something that is unrecognizably
different from original and require proof of correction of its own.
Classic 'divide and conquer smaller part first" strategy does not
appear applicable here, or at least not obviously.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Chris M. Thomasson on Mon Mar 18 14:40:07 2024

On Sun, 17 Mar 2024 13:19:29 -0700
"Chris M. Thomasson" <[email protected]> wrote:

On 3/16/2024 1:29 PM, Chris M. Thomasson wrote:

On 3/16/2024 1:02 PM, Malcolm McLean wrote:

On 16/03/2024 18:21, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove
correct.

Are you prepared to offer any evidence to support this
astonishing statement or can we just assume it's another
Malcolmism?

Example given. A recursive algorithm which is hard to reason
about and

Perhaps hard for _you_ to reason about. That doesn't
generalize to every other programmer that might read that
code.

From experience this one blows the stack, but not always.
Sometimes it's OK to use.

Blowing the stack is not good at all. However, sometimes, I
consider a recursive algorithm easier to understand. So, I build it first... Get it working, _then_ think about an iterative
solution...

Gaining the iterative solution from a working recursive solution is
the fun part!

:^)

I did.
After a bit of polish applied to corners (on x86-64) it consumes
approximately 60 times less extra memory than recursive variant of
Malcolm and is approximately 2.5 faster than non-naive recursion.
But it still decisively slower than Malcolm's non-recursive code:
~4x for 'snake' shape, ~2x for solid rectangle.
Malcolm's algorithm is simply better than recursive one.
Most likely because it visits already re-colored pixels less often.

For those interested, here is 'explicit stack' variant of recursive
algorithm:

int floodfill_r_explicite_stack(
unsigned char *grey,
int width, int height,
int x, int y,
unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;

const ptrdiff_t initial_stack_sz = 256;
char* stack = malloc(initial_stack_sz*sizeof(*stack));
if (!stack)
return -1;
char* sp = stack;
char* end_stack = &stack[initial_stack_sz];

enum { ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN, };
for (;;) {
do {
if (grey[y*width+x] != target)
break; // back to caller

grey[y*width+x] = dest;
x -= 1;
// push state to stack
if (sp == end_stack) { // allocate more stack space
ptrdiff_t old_sz = sp-stack;
ptrdiff_t new_sz = old_sz + old_sz/2;
stack = realloc(stack, new_sz*sizeof(*stack));
if (!stack)
return -1;
sp = &stack[old_sz];
end_stack = &stack[new_sz];
}
*sp++ = ST_LEFT; // recursive call
} while (x >= 0);

for (;;) {
if (sp == stack) { // we are back at top level
free(stack);
return 1; // done
}

char state = *--sp; // pop stack (back to caller)
switch (state) {
case ST_LEFT:
x += 2;
if (x < width) {
*sp++ = ST_RIGHT; // recursive call
break;
}
// fall throw

case ST_RIGHT:
x -= 1;
y -= 1;
if (y >= 0) {
*sp++ = ST_UP; // recursive call
break;
}
// fall throw

case ST_UP:
y += 2;
if (y < height) {
*sp++ = ST_DOWN; // recursive call
break;
}
// fall throw

case ST_DOWN:
y -= 1;
continue; // back to caller
}
break;
}
}
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Mon Mar 18 17:28:41 2024

On 18/03/2024 10:26, Malcolm McLean wrote:

On 18/03/2024 06:58, David Brown wrote:

On 17/03/2024 19:28, Malcolm McLean wrote:

On 17/03/2024 16:45, David Brown wrote:

On 16/03/2024 16:09, Malcolm McLean wrote:

The OP's code is simple and obvious, as is its correctness (assuming
reasonable definitions of the pixel access and setting functions)
and its time and space requirements. Yours is not.

Except it is not. You didn't give the right answer for the space
requirements.

Unfortunately, I am still fallible - /easier/ does not mean I'll get
it right :-( And I apologise for unhelpfully rushing that and getting
it wrong.

However, I stand by my claim that the recursive version is much easier
to analyse.

I this case it s very short code and easy to see that it is right, so a
win for recursion.

To be clear - I don't claim that recursive code is /always/ easier to
analyse. I claim it is easier in this case, and in many other cases
(thus countering your claim that it is /always/ harder).

Except that it is only right if the stack is bigger
than N/2 calls deep, where N is the number fo pixels in the image.

It is completely normal for correctness proofs to make assumptions about
things like resources. An analysis of your code for correctness would
also generally assume that the heap would be big enough - if the heap
runs out, your code will not correctly flood-fill the image. Analysis
of efficiency in time and space is a separate issue - related, but
separate. Things like maximum recursion depth (and heap size) are very implementation-specific, and thus need to be considered separately from
the algorithm itself.

And while this code is in C, the same algorithm could be implemented in
other languages. A language that uses a VM might be fine with a much
higher recursion depth - or it might be much lower. A language for
which recursion is a major tool (such as a functional programming
language) might automatically convert some recursive code to a
queue-based non-recursive solution. (I'd be impressed to see one do
that for this algorithm, however.)

Now a
100x100 woked fine an my machine - I just checked the main stack, and
it's 8MB by default. BUt of cuuurse the bigger than machine, the bigger
the image th euser might want to load.

You still haven't considered using a spell-checker, even though you use
a news client with one built in? Perhaps you need a better keyboard?

It's better to have one function. Subroutines have a way of getting
lost.>

Seriously? "Subroutines get lost" ? So your answer is to put all
your ideas in a mixer and scrunch them up until any semblance of logic
and structure is lost, and the code works (if it does) by trial and
error? And then the whole mess is cut-and-paste duplicated - along
with any subtle bugs it might have - for 8-connected version. And
that's better, in your eyes, than re-using code?

Exactly. If a routine ia leaf, you can cut and paste it and use it where
you will. If you have to take subroutines, you've got to explore the
code to understand what you neeed to take, then you have to out them somewhere. So it's better to keep routines leaf is possible and fold a
few trivial operations into the code body, even if ideally they would be subroutines. And I understand these trade offs. >

That is a, shall we say, "interesting" attitude.

I have been most interested in being able to be sure the algorithm is
correct, rather than considering its absolute (rather than "big O")
efficiency in different systems. It is certainly the case that cache
considerations are more relevant now than they used to be on many
systems. And for working on PC's, you would likely dispense with your
growing stack entirely and simply allocate a queue big enough for
every pixel in the image.

That is an idea. But a bit extravanagant. I'd like to try to work out
how much quue s actually used in typical as well as worst case.

The worst case is either going to be the stripy path example given by
Michael S., or a completely blank image - it depends on how the
east-west stripes affect the queue depth. It should not be hard to try
these. So that would be either approximately half the total pixel
count, or the total pixel count. And I can't think how you could
specify a "typical" image and "typical" flood fill request - without
specifying this in some way, you need to collect lots of statistics of real-world use, or it's mere guesswork.

I suggested separating the code into functions - that is /definitely/
constructive. I suggested using sensible names for parameters and
variables (well, the suggestion was implied by my criticism).

And I am also suggesting now that you allocate a queue that is big
enough for every pixel in the image. Much of what you don't touch of
that space, will probably never be physically allocated by the OS,
depending on page sizes and free memory.

And I would also suggest you drop the requirement for coding in an
ancient tongue, and instead switch to reasonably modern C. Make
abstractions for the types and the access functions - it will make the
code far easier to follow, easier to show correct, and easier to
modify and reuse, without affecting efficiency at run-time.

And of course the entire binary image library has a consistent style.
And we don't want the user mee=ssing about with writing his own getpixel
/ setpixel functions, thouhg there would be a case for that for a
geneeral purpose flood fill.

That would be the "royal we", I presume? I know /I/ would have no use
for a flood-fill routine that did not support colour styles I use.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Malcolm McLean on Mon Mar 18 17:42:02 2024

Malcolm McLean <[email protected]> writes:

On 18/03/2024 16:28, David Brown wrote:

On 18/03/2024 10:26, Malcolm McLean wrote:

It is completely normal for correctness proofs to make assumptions about
things like resources. An analysis of your code for correctness would
also generally assume that the heap would be big enough - if the heap
runs out, your code will not correctly flood-fill the image. Analysis
of efficiency in time and space is a separate issue - related, but
separate. Things like maximum recursion depth (and heap size) are very
implementation-specific, and thus need to be considered separately from
the algorithm itself.

It's trivial to engineer a system with a large stack and very small
heap. But unlikley anyone would actually do so on a system on which
floodfill would run.

The first sentence is correct. Although with modern systems, 'small'
is relative (my 12 year old workstation has 16GB RAM) and defaults
to an 8MB stack, which can easily be increased on a per process or
per user basis.

The second is your opinion. What evidence do you have that
your opinion is fact?

And while this code is in C, the same algorithm could be implemented in
other languages. A language that uses a VM might be fine with a much
higher recursion depth - or it might be much lower. A language for
which recursion is a major tool (such as a functional programming
language) might automatically convert some recursive code to a
queue-based non-recursive solution. (I'd be impressed to see one do
that for this algorithm, however.)

Now a 100x100 woked fine an my machine - I just checked the main
stack, and it's 8MB by default. BUt of cuuurse the bigger than
machine, the bigger the image th euser might want to load.

You still haven't considered using a spell-checker, even though you use
a news client with one built in? Perhaps you need a better keyboard?

I'll try it out. Since you're dyslexic.

I believe you're conflating David with someone else
who made that claim.

Normal readers can read English

Ah, a not-so-subtle insult to those who happen to suffer from dyslexia.

text with just the initial and terminal letters right and the rest
jumbled, at similar speed to normal text.

Pixels usually represent objects. Take a glance around your. How many
objects of a similar colour are spider's webs, lace curtains, long
wires, and so on. And how many are pieces of paper, coffee cups.
computer mice, and so on? And of course it mustn't fall over on the
unusual objects, but the main consideration is usually that it is fast
and efficient on the common ones.

The main consideration must be that it sufficient, readable and
maintainable. Fast and efficient aren't always a driving goal,
particularly for rarely used operations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Scott Lurndal on Mon Mar 18 18:50:40 2024

On 18/03/2024 17:42, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 18/03/2024 16:28, David Brown wrote:

On 18/03/2024 10:26, Malcolm McLean wrote:

It is completely normal for correctness proofs to make assumptions about >>> things like resources. An analysis of your code for correctness would
also generally assume that the heap would be big enough - if the heap
runs out, your code will not correctly flood-fill the image. Analysis
of efficiency in time and space is a separate issue - related, but
separate. Things like maximum recursion depth (and heap size) are very >>> implementation-specific, and thus need to be considered separately from
the algorithm itself.

It's trivial to engineer a system with a large stack and very small
heap. But unlikley anyone would actually do so on a system on which
floodfill would run.

The first sentence is correct. Although with modern systems, 'small'
is relative (my 12 year old workstation has 16GB RAM) and defaults
to an 8MB stack, which can easily be increased on a per process or
per user basis.

The second is your opinion. What evidence do you have that
your opinion is fact?

It seems the most likely. People don't run programs whose sole purpose
is to floodfill, so that they can request a huge stack.

It will likely be part of a much larger application with conventional
stack usage.

The floodfill may be part of a library, and itself wrapped by another
library that the application knows about.

It is even possible that when the application is built, it doesn't know
that a floodfill routine is to be called. (For example, an interpreter
that will run a program that /might/ call a floodfill routine.)

As the author of such a routine, you don't want to have to rely on a
stack large enough to cope with, say, a 30Mpix image which might need a 30M-deep maximum call-depth, which could easily use up 500MB of memory.
stack.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Mon Mar 18 11:36:29 2024

Michael S <[email protected]> writes:

On Sun, 17 Mar 2024 18:25:20 +0200
Michael S <[email protected]> wrote:

On Sun, 17 Mar 2024 14:56:34 +0000
Malcolm McLean <[email protected]> wrote:

[a floodfill routine posted by Malcolm]

[...]

[a recursive area fill written by Michael S]

I did my own measurements with snake-like image from my first
response to Malcolm. For this shape, recursive version (after my improvement) is almost exactly 10 times slower than Malcolm's
iterative code. And suspect to stack overflow although a little
less so than original.

It's hard to write a recursive area fill routine if one wants to
guarantee worst case behavior in all cases. This problem is not
a good fit to using recursion without there being some kind of
constraints on what the inputs will be.

Even if in Big Oh sense they are the same, it does look like
Malcolm's variant is decisively faster in practice.

I've done some tests with Malcolm's code. Some observations:

It uses more memory than it needs to.

It's anisotropic, which is to say it behaves differently with
respect to changes in width than it does to changes in height.

It doesn't scale well. In particular worst case performance
scaling is worse than O(N) (as determined experimentally, not
theoretically).

The code is much longer than is needed just to do an area fill.
A small fraction of that is simply layout style, but mostly it's
that the code is more complicated than it needs to be.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to fir on Mon Mar 18 13:09:08 2024

fir <[email protected]> writes:

i was writing simple editor (something like paint but more custom for
my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to
do it by hand pixel by pixel and the need to change color of given
element is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on
adjacent 4 pixels (only need check if it is in screen at all and if
the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this - incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but still

is there maybe simple way to improve it?

As others have explained using simple recursion like this
runs the risk of producing a stack overflow.

Here is a short routine that uses allocated memory rather than
recursion, and so does not have the stack overflow risk that
the above recursive routine does.

The code below uses a slightly different interface to access
the pixel field but I expect you can see how to adapt it to
your interface.

Also the code uses a variably modified type in two places. It
should be easy to change the code to use ordinary types rather
than variably modified types if it's important to do that in
your environment. And it may be the case that changing to use
a different interface to access and change the pixel field will
get rid of the variably modified types so that they wouldn't be
needed anyway.

Oh, before I forget. If someone doesn't like using a single
fixed-size allocated area, it isn't hard to change the code
so that the allocated area grows as needed (and starting with
a smaller size, presumably). I leave doing that as an
exercise.

The code:

#include <assert.h>

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color, Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color new ){
static const Point deltas[4] = { {1,0}, {0,1}, {-1,0}, {0,-1}, };
Index k = 0;
Index n = (w+h) *17 /16 +10;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) ) todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
if( ! change_it( w, h, pixels, q, old, new ) ) continue;
assert( j > k );
todo[ k++ ] = q;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color new ){
if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old ) return 0;
return pixels[p.x][p.y] = new, 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Mon Mar 18 14:13:19 2024

Michael S <[email protected]> writes:

On Mon, 18 Mar 2024 03:00:32 -0700
Tim Rentsch <[email protected]> wrote:

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

The claim is that recursion always makes programs harder to reason
about and prove correct. It's easy to find examples that show
recursion does not always makes programs harder to reason about and
prove correct.

I personally find recursion hard work and errors much harder to
debug.

Most likely that's because you haven't had the relevant background
in learning how to program in a functional style. That matches my
own experience: it was only after learning how to write programs in
a functional style that I really started to appreciate the benefits
of using recursion, and to understand how to write and reason about
recursive programs.

It is also becomes much more important to show that will not cause
stack overflow.

In most cases it's enough to show that the stack depth never exceeds
log N for an input of size N. I use recursion quite routinely
without there being any significant danger of stack overflow. It's
a matter of learning which patterns are safe and which patterns are
potentially dangerous, and avoiding the dangerous patterns (unless
certain guarantees can be made to make them safe again).

The problem in this case is that max. depth of recursion is O(N)
where N is total number of pixels to change color. So far I
didn't find an obvious way to cut the worst case by more than
small factor without turning recursive algorithm into something
that is unrecognizably different from original and require proof
of correction of its own. Classic 'divide and conquer smaller
part first" strategy does not appear applicable here, or at least
not obviously.

Right. I said as much in another reply to you. This problem
is not well suited to a recursive solution.

To clarify my earlier comment, when I say I routinely use
recursion I do not mean I always use recursion. Part of
understanding programming in a functional style is knowing
when not to use recursion as well as when to use it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to All on Mon Mar 18 22:42:14 2024

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color, Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color new ){
static const Point deltas[4] = { {1,0}, {0,1}, {-1,0}, {0,-1}, };
UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) ) todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
if( ! change_it( w, h, pixels, q, old, new ) ) continue;
todo[ k++ ] = q;
}

if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color new ){
if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old ) return 0;
return pixels[p.x][p.y] = new, 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Malcolm McLean on Mon Mar 18 22:54:37 2024

Malcolm McLean <[email protected]> writes:

On 18/03/2024 09:30, Tim Rentsch wrote:

Michael S <[email protected]> writes:

[...]

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through
diagonal?

It is customary in raster graphics to count pixels as adjacent
only if they share an edge, not if they just share a corner.
Usually that gives better results; the exceptions tend to need
special handling anyway and not just connecting through
diagonals.

Though with a binary image, if the foreground is 4-connected, the
background must therefore be 8-connected.

It might be but it doesn't have to be.

Also different terminology should be used, since 4-connected
(also N-connected, for other integer N) has a specific meaning in
graph theory, and one very different than what is meant above.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Malcolm McLean on Mon Mar 18 23:10:21 2024

Malcolm McLean <[email protected]> writes:

On 18/03/2024 18:36, Tim Rentsch wrote:

It doesn't scale well. In particular worst case performance
scaling is worse than O(N) (as determined experimentally, not
theoretically).

Is that because the queue is being memmoved instead of using a
circular buffer when it gets towards the end?

I'm sure I don't know, and I'm astonished that you would ask.
It's your code after all. IMO it should simply be thrown out and
re-written; it pains me just to look at it, let alone to try to
understand or fix it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Tue Mar 19 11:32:06 2024

On 18/03/2024 18:25, Malcolm McLean wrote:

On 18/03/2024 16:28, David Brown wrote:

On 18/03/2024 10:26, Malcolm McLean wrote:

It is completely normal for correctness proofs to make assumptions
about things like resources. An analysis of your code for correctness
would also generally assume that the heap would be big enough - if the
heap runs out, your code will not correctly flood-fill the image.
Analysis of efficiency in time and space is a separate issue -
related, but separate. Things like maximum recursion depth (and heap
size) are very implementation-specific, and thus need to be considered
separately from the algorithm itself.

It's trivial to engineer a system with a large stack and very small
heap. But unlikley anyone would actually do so on a system on which
floodfill would run.

It is unlikely (that is how the word is spelt) that people would use
your code for a flood fill either - but I fully agree that it is
unlikely that a simple recursive flood fill algorithm is much use as a practical way to do flood fills in any real-world software.

And while this code is in C, the same algorithm could be implemented
in other languages. A language that uses a VM might be fine with a
much higher recursion depth - or it might be much lower. A language
for which recursion is a major tool (such as a functional programming
language) might automatically convert some recursive code to a
queue-based non-recursive solution. (I'd be impressed to see one do
that for this algorithm, however.)

Now a 100x100 woked fine an my machine - I just checked the main
stack, and it's 8MB by default. BUt of cuuurse the bigger than
machine, the bigger the image th euser might want to load.

You still haven't considered using a spell-checker, even though you
use a news client with one built in? Perhaps you need a better keyboard? >>

I'll try it out. Since you're dyslexic. Normal readers can read English
text with just the initial and terminal letters right and the rest
jumbled, at similar speed to normal text.

You are fond of making up "Malcolm facts" about the cognitive effort
involved in understanding things like nested parentheses. Poor
spelling, typos, grammatical errors, and the like similarly increase the cognitive effort in reading your posts. When it takes too much effort
to figure out what you are trying to say, it is not worth the bother.

I am not looking for perfection here - mistakes happen. I am merely
looking for a minimum of effort on your part, such as using the
spell-checker built into your newsreader. And I don't expect you to do
this for /me/, or because I or anyone else is dyslexic. I consider it a
basic level of politeness and respect for others. I find it
extraordinary that you have been so reluctant to take this step before now.

The worst case is either going to be the stripy path example given by
Michael S., or a completely blank image - it depends on how the
east-west stripes affect the queue depth. It should not be hard to
try these. So that would be either approximately half the total pixel
count, or the total pixel count. And I can't think how you could
specify a "typical" image and "typical" flood fill request - without
specifying this in some way, you need to collect lots of statistics of
real-world use, or it's mere guesswork.

Pixels usually represent objects. Take a glance around your. How many
objects of a similar colour are spider's webs, lace curtains, long
wires, and so on. And how many are pieces of paper, coffee cups.
computer mice, and so on? And of course it mustn't fall over on the
unusual objects, but the main consideration is usually that it is fast
and efficient on the common ones.

There is nothing that I or anyone else can see that could possibly be considered a "typical" image - though there are clearly things that are commonly seen. And simple colour-matching flood fills are totally
pointless on any real image (photographs, realistic renderings, etc.).

But not always of course, Sometimes results must in in under 0.1
seconds, and so 0.09 is as good as 0.01, but 0.11 on a rare spider's web
is catastrophic.

I cannot imagine the situation where you would have a hard real-time
limit of 0.1 seconds to do a simplistic flood fill on a rare spider's
web (or picture thereof).

I think it would suffice to test the code on a few worst-case samples,
and a few examples of images you have yourself that you need to
flood-fill. If the speed is good enough on your computer during such
tests, that would be all you need. You are not making a real-world
reusable graphics library or a serious image manipulation tool here.

That would be the "royal we", I presume? I know /I/ would have no use
for a flood-fill routine that did not support colour styles I use.

This routine is part of the binary image processing library, so of
course it is written to be easy to use with binary images, or binary
images which have been processed and are no longer strictly binary
images. But if people want to take it and use it as pattern for a
general flood fill, then of course I'm perfectly happy that they have
found the code to be of use.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Scott Lurndal on Tue Mar 19 11:41:01 2024

On 18/03/2024 18:42, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 18/03/2024 16:28, David Brown wrote:

You still haven't considered using a spell-checker, even though you use
a news client with one built in? Perhaps you need a better keyboard?

I'll try it out. Since you're dyslexic.

I believe you're conflating David with someone else
who made that claim.

I did, in another post, say that I am mildly dyslexic. The context was
that I understand spelling can be difficult - my spelling, without a spell-checker, is often terrible. But my reading level is very high.

I expect most people here can figure out the words Malcolm meant to type
when he fails to press the right keys. But we should not have to do so.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Tue Mar 19 12:31:05 2024

On 18/03/2024 19:10, Keith Thompson wrote:

Malcolm McLean <[email protected]> writes:

On 18/03/2024 16:28, David Brown wrote:

[...]

You still haven't considered using a spell-checker, even though you
use a news client with one built in? Perhaps you need a better
keyboard?

I'll try it out. Since you're dyslexic. Normal readers can read
English text with just the initial and terminal letters right and the
rest jumbled, at similar speed to normal text.

I will not speculate about why you seem to be unaware that calling
someone dyslexic is insulting, both to the person you're addressing and
to people with dyslexia.

You need to stop making disparaging personal comments.

I think Malcolm is truly unaware of how these kinds of comments could be
taken.

To be clear here, I did mention in another post that I am dyslexic. So
he was not saying it out of the blue.

However, it does not seem that he has a very good idea of what dyslexia,
in all its forms and variations, actually is. My dyslexia does not
affect my reading at all (as far as any measurements have ever shown),
but it affects my spelling quite a lot. (It has other effects too, but
we don't need to cover everything here.)

(Malcolm's comment about "normal readers" reading jumbled text has a
grain of truth to it, but not much more than a grain.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Tue Mar 19 13:18:42 2024

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color
new ){ static const Point deltas[4] = { {1,0}, {0,1}, {-1,0},
{0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}

if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color
new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old )
return 0; return pixels[p.x][p.y] = new, 1;
}

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.
Is it the same algorithm?

Besides, I don't think that use of VLA in library code is a good idea.
VLA is optional in latest C standards. And incompatible with C++.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to David Brown on Tue Mar 19 12:19:47 2024

On 19/03/2024 10:41, David Brown wrote:

I expect most people here can figure out the words Malcolm meant to type
when he fails to press the right keys. But we should not have to do so.

So ... the poster should make the effort once, rather than the 1000s of
readers should be made to make the effort 1000s of times.

This is usenet etiquette 101.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter 'Shaggy' Haywood@21:1/5 to All on Tue Mar 19 10:57:55 2024

Groovy hepcat Lew Pitcher was jivin' in comp.lang.c on Mon, 18 Mar 2024
01:27 am. It's a cool scene! Dig it.

On 16/03/2024 04:11, fir wrote:

[Snip.]

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

[Snippity doo dah.]

Take fir's example code above; a simple single call to RecolorizePixelAndAdjacentOnes() will effectively recolour the
origin cell multiple times, because of how the recursion is handled.

No, I don't think so. You seem to have missed the fact that it checks
the colour of the "current" pixel, and only continues (setting new
colour & recursing) if it is the old colour.
Of course, I'm infering (guessing) the functionality, at least
partially (Unsafe? Safe?), of GetPixelUnsafe() and SetPixelSafe() based
on their names.

[Snip Lew's examples.]

--

----- Dig the NEW and IMPROVED news sig!! -----

-------------- Shaggy was here! ---------------
Ain't I'm a dawg!!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Tue Mar 19 15:49:00 2024

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

No. Mine takes horizontal scan lines and extends them, then places
the pixels above and below in a queue to be considered as seeds for
the next scan line. (It's not mine, but I don't know who invented it.
It wasn't me.)

Tim, now what does it do? Essentially it's the recursive fill
algorithm but with the data only on the stack instead of the call and
the data. And todo is actually a queue rather than a stack.

Now why would it be slower? Probaby because you usually only hit a
pixel three times with mine - once below, once above, and once for
the scan line itself, whilst you consider it 5 times for Tim's - once
for each neighbour and once for itself. Then horizontally adjacent
pixels are more likely to be in the same cache line than vertically
adjacent pixels, so processing images in scan lines tends to be a bit
faster.

Below is a variant of recursive algorithm that is approximately as
fast as your code (1.25x faster for filling solid rectangle, 1.43x
slower for filling snake shape).
The code is a bit long, but I hope that the logic is still obvious and
there is no need to prove correctness.
I have a micro-optimized variant of the same algorithm that is as fast
or faster than yours in all cases that I tested, but posting
micro-optimized code on c.l.c is a bad sportsmanship.
Recursion depth of this algorithm for typical solid shape is O(max(width,height)), but for a worst case it still very bad, about N/4.
And since there are more local variable to preserve, the worst case
size of occupied stack is likely even bigger than in simple (but
non-naive) recursion. So, while fast, I wouldn't use this algorithm in general-purpose library.
But it can serve as a reference point for implementation with explicit
stack.

struct recursive_context_t {
unsigned char *grey;
int width, height;
unsigned char target, dest;
};

static void floodfill_r_core(const struct recursive_context_t* context,
int x, int y);

int floodfill_r(
unsigned char *grey,
int width, int height,
int x, int y,
unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
struct recursive_context_t context = {
.grey = grey,
.width = width,
.height = height,
.target = target,
.dest = dest,
};
floodfill_r_core(&context, x, y);
return 1;
}

static void floodfill_r_core(const struct recursive_context_t* context,
int x, int y) {
// point (x,y) is in target rectangle and has target color. It's
guaranteed by caller

// Find maximal cross (of Saint George's variety) with target color
and center at (x,y) // go left
int x0;
for (x0 = x-1; x0 >= 0 &&
context->grey[y*context->width+x0] == context->target; --x0); ++x0;
// go right
int x1;
for (x1 = x+1; x1 < context->width &&
context->grey[y*context->width+x1] == context->target; ++x1); // go up
int y0;
for (y0 = y-1; y0 >= 0 &&
context->grey[y0*context->width+x] == context->target; --y0); ++y0;
// go down
int y1;
for (y1 = y+1; y1 < context->height &&
context->grey[y1*context->width+x] == context->target; ++y1);

// Fill cross with destination color
for (int i = x0; i < x1; ++i)
context->grey[y*context->width+i] = context->dest;
for (int i = y0; i < y1; ++i)
context->grey[i*context->width+x] = context->dest;

if (y > 0) { // recursion into points above horizontal line
unsigned char *row = &context->grey[(y-1)*context->width];
for (int i = x0; i < x1; ++i)
if (row[i] == context->target)
floodfill_r_core(context, i, y-1);
}
if (y+1 < context->height) { // recursion into points below
horizontal line unsigned char *row =
&context->grey[(y+1)*context->width]; for (int i = x0; i < x1; ++i)
if (row[i] == context->target)
floodfill_r_core(context, i, y+1);
}
if (x > 0) { // recursion into points left of vertical line
unsigned char *col = &context->grey[x-1];
for (int i = y0; i < y1; ++i)
if (col[i*context->width] == context->target)
floodfill_r_core(context, x-1, i);
}
if (x+1 < context->width) { // recursion into points right of
vertical line unsigned char *col = &context->grey[x+1];
for (int i = y0; i < y1; ++i)
if (col[i*context->width] == context->target)
floodfill_r_core(context, x+1, i);
}
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Tue Mar 19 16:05:50 2024

Michael S wrote:

On Mon, 18 Mar 2024 03:00:32 -0700
Tim Rentsch <[email protected]> wrote:

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

The claim is that recursion always makes programs harder to reason
about and prove correct. It's easy to find examples that show
recursion does not always makes programs harder to reason about and
prove correct.

I personally find recursion hard work and errors much harder to
debug.

Most likely that's because you haven't had the relevant background
in learning how to program in a functional style. That matches my
own experience: it was only after learning how to write programs in
a functional style that I really started to appreciate the benefits
of using recursion, and to understand how to write and reason about
recursive programs.

It is also becomes much more important to show that will not cause
stack overflow.

In most cases it's enough to show that the stack depth never exceeds
log N for an input of size N. I use recursion quite routinely
without there being any significant danger of stack overflow. It's
a matter of learning which patterns are safe and which patterns are
potentially dangerous, and avoiding the dangerous patterns (unless
certain guarantees can be made to make them safe again).

The problem in this case is that max. depth of recursion is O(N) where N
is total number of pixels to change color. So far I didn't find an
obvious way to cut the worst case by more than small factor without
turning recursive algorithm into something that is unrecognizably
different from original and require proof of correction of its own.
Classic 'divide and conquer smaller part first" strategy does not
appear applicable here, or at least not obviously.

in reality it is less i guess..
well that would be like if i would like to recolor
vertical line of say length 2 milion pixels
- i would go always one pixel right 2 milion times

if this is 100x 100 square and i put the initioation
in middle it would go 50x right then at depth 50
it would go one up than i guess 100 times left

then just about this line up until up edge of picture
- then it probably revert back (with a lot
of false is) to first line and then go down

- so it seems (though i was not checkingh it
tu much in my head) that the depth in that case
would be about half

- but this is becouse its much unfortunate,
'normally' i think the recursion depth
should be more like to edge of an area

(i will answer more later as i hate usenet by newsreader
so unconveniant to read and answer its pain)

the problem has a couple of aspects imo
- interesting is in fact the great simplicity
of this recursion method esp in that case - which gives to think

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to fir on Tue Mar 19 17:16:42 2024

On 19/03/2024 15:05, fir wrote:

Michael S wrote:

On Mon, 18 Mar 2024 03:00:32 -0700
Tim Rentsch <[email protected]> wrote:

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

The claim is that recursion always makes programs harder to reason
about and prove correct. It's easy to find examples that show
recursion does not always makes programs harder to reason about and
prove correct.

I personally find recursion hard work and errors much harder to
debug.

Most likely that's because you haven't had the relevant background
in learning how to program in a functional style. That matches my
own experience: it was only after learning how to write programs in
a functional style that I really started to appreciate the benefits
of using recursion, and to understand how to write and reason about
recursive programs.

It is also becomes much more important to show that will not cause
stack overflow.

In most cases it's enough to show that the stack depth never exceeds
log N for an input of size N. I use recursion quite routinely
without there being any significant danger of stack overflow. It's
a matter of learning which patterns are safe and which patterns are
potentially dangerous, and avoiding the dangerous patterns (unless
certain guarantees can be made to make them safe again).

The problem in this case is that max. depth of recursion is O(N) where N
is total number of pixels to change color. So far I didn't find an
obvious way to cut the worst case by more than small factor without
turning recursive algorithm into something that is unrecognizably
different from original and require proof of correction of its own.
Classic 'divide and conquer smaller part first" strategy does not
appear applicable here, or at least not obviously.

in reality it is less i guess..
well that would be like if i would like to recolor
vertical line of say length 2 milion pixels
- i would go always one pixel right 2 milion times

if this is 100x 100 square and i put the initioation
in middle it would go 50x right then at depth 50
it would go one up than i guess 100 times left

then just about this line up until up edge of picture
- then it probably revert back (with a lot
of false is) to first line and then go down

That's what I thought until I tried it.

If I start with an 18x18 image of all zeros, then fill starting from the
centre with a 'colour' that is an incrementing value, then the final
image displayed as a table of integers looks like this:

171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
172 173 174 175 176 177 178 179 180 1 2 3 4 5 6 7 8 9
209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

By following the sequence starting from 1, you can see the fill-pattern.

It's not clear how it gets from 171 at top left to 172 half-way down the
left edge.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Tue Mar 19 19:18:59 2024

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

On 19/03/2024 11:18, Michael S wrote:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
{-1,0}, {0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y
}; if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}

if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
old ) return 0; return pixels[p.x][p.y] = new, 1;
}

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.
Is it the same algorithm?

No. Mine takes horizontal scan lines and extends them, then places
the pixels above and below in a queue to be considered as seeds for
the next scan line. (It's not mine, but I don't know who invented it.
It wasn't me.)

Tim, now what does it do? Essentially it's the recursive fill
algorithm but with the data only on the stack instead of the call and
the data. And todo is actually a queue rather than a stack.

Now why would it be slower? Probaby because you usually only hit a
pixel three times with mine - once below, once above, and once for
the scan line itself, whilst you consider it 5 times for Tim's - once
for each neighbour and once for itself. Then horizontally adjacent
pixels are more likely to be in the same cache line than vertically
adjacent pixels, so processing images in scan lines tends to be a bit
faster.

I did a little more investigation gradually modifying Tim's code for
improved performance without changing the basic principle of the
algorithm. Yes, micro-optimization. Yes, I said earlier that doing so
in c.l.c it is bad sportsmanship. So what? I never claimed to be an
ideal sportsman.
The point is that after optimizations it's actually faster than the
best implementations of original recursive algorithm, including
implementation that uses explicit stack and is quite economical in its
memory consumption. Tim's algorithm is 8 times less economical (8 bytes
per saved node vs 1 byte in explicit stack) and nevertheless almost
twice faster for both shapes that I was testing.
So far, this algorithm is fastest among all "local" algorithms that I
tried. By "local" I mean algorithms that don't try to recolor more than
one pixel at time.
"Non-local" algorithms i.e. yours and my recursive algorithm that
recolors St. George cross, are somewhat faster, but I suspect that
it's because all shapes that I use for testing have either long
columns or long rows or both.
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost nothing
else.
The second nice thing is that it is easy to understand. Not as easy as
original recursive method, but easier than the rest of them.

If you or somebody else is interested, here is [micro]optimized variant:

#include <stdlib.h>
#include <stddef.h>
#include <string.h>

typedef unsigned char Color;
typedef int UI;
typedef struct { UI x, y; } Point;

static inline
Point* circularIncr(Point* p, Point* beg, Point* end) {
return p + 1 == end ? beg : p + 1;
}

static inline
Point mk_point(int x, int y) {
Point pt={x,y};
return pt;
}

int floodfill_r(
Color *pixels,
int w, int h,
int pt0_x, int pt0_y,
Color old, Color new)
{
if (pt0_x < 0 || pt0_x >= w || pt0_y < 0 || pt0_y >= h)
return 0;

if (pixels[pt0_y*w+pt0_x] != old)
return 0;

pixels[pt0_y*w+pt0_x] = new;

const ptrdiff_t INITIAL_TODO_SIZE = 125;
Point *todo = malloc( (INITIAL_TODO_SIZE+3) * sizeof *todo );
// +3 is extra size to assist wrap-around of wr
if (!todo)
return -1;
Point* todo_end = &todo[INITIAL_TODO_SIZE];

todo[0] = mk_point(pt0_x, pt0_y);
Point* wr = &todo[1];
Point* rd = todo;
ptrdiff_t free_space = INITIAL_TODO_SIZE - 1;
do {
Point pt = *rd;
rd = circularIncr(rd, todo, todo_end);
Point* prev_wr = wr;
if (pt.x > 0 && pixels[pt.y*w+pt.x-1] == old) {
pixels[pt.y*w+pt.x-1] = new;
*wr++ = mk_point(pt.x-1, pt.y);
}
if (pt.y > 0 && pixels[pt.y*w+pt.x-w] == old) {
pixels[pt.y*w+pt.x-w] = new;
*wr++ = mk_point(pt.x, pt.y-1);
}
if (pt.x+1 < w && pixels[pt.y*w+pt.x+1] == old) {
pixels[pt.y*w+pt.x+1] = new;
*wr++ = mk_point(pt.x+1, pt.y);
}
if (pt.y+1 < h && pixels[pt.y*w+pt.x+w] == old) {
pixels[pt.y*w+pt.x+w] = new;
*wr++ = mk_point(pt.x, pt.y+1);
}

free_space += 1 - (wr - prev_wr);
if (wr >= todo_end) {
memcpy(todo, todo_end, (wr - todo_end)*sizeof(*wr));
wr += todo - todo_end;
}

if (free_space < 4) {
ptrdiff_t rdi = rd-todo;
ptrdiff_t wri = wr-todo;
ptrdiff_t sz = todo_end - todo;
ptrdiff_t incr = sz/4;
Point* new_todo = realloc(todo, (sz+incr+3) * sizeof *todo );
// +3 is extra size to assist wrap-around of wr
if(!new_todo) {
free(todo);
return -1;
}
free_space += incr;
rd = &new_todo[rdi];
wr = &new_todo[wri];
todo = new_todo;
todo_end = &todo[sz+incr];
if (rd >= wr) {
memmove(&rd[incr], rd, (sz-rdi) * sizeof *todo );
rd = &rd[incr];
}
}
} while (rd != wr);

free( todo );
return 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to bart on Tue Mar 19 17:33:56 2024

On 19/03/2024 17:16, bart wrote:

On 19/03/2024 15:05, fir wrote:

if this is 100x 100 square and i put the initioation
in middle it would go 50x right then at depth 50
it would go one up than i guess 100 times left

then just about this line up until up edge of picture
- then it probably revert back (with a lot
of false is) to first line and then go down

That's what I thought until I tried it.

If I start with an 18x18 image of all zeros, then fill starting from the centre with a 'colour' that is an incrementing value, then the final
image displayed as a table of integers looks like this:

171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
172 173 174 175 176 177 178 179 180 1 2 3 4 5 6 7 8 9
209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

It's not clear how it gets from 171 at top left to 172 half-way down the
left edge.

Actually, a more revealing picture is produced when storing the
calldepth in each cell rather than sequence number (these images are now 16-bits/cell rather than 8):

171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
28 29 30 31 32 33 34 35 36 1 2 3 4 5 6 7 8 9
65 66 67 68 69 70 71 72 37 38 39 40 41 42 43 44 45 46
64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155

At least now I know how it gets from the top left to the 172 in the top
image: it must do a cascade of Returns until it gets back to call-depthe
27 in this second chart, then it does the cell immediately below.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to bart on Tue Mar 19 23:07:30 2024

bart wrote:

On 19/03/2024 15:05, fir wrote:

Michael S wrote:

On Mon, 18 Mar 2024 03:00:32 -0700
Tim Rentsch <[email protected]> wrote:

bart <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct. >>>>>>>

Are you prepared to offer any evidence to support this astonishing >>>>>> statement or can we just assume it's another Malcolmism?

You have evidence to suggest that the opposite is true?

The claim is that recursion always makes programs harder to reason
about and prove correct. It's easy to find examples that show
recursion does not always makes programs harder to reason about and
prove correct.

I personally find recursion hard work and errors much harder to
debug.

Most likely that's because you haven't had the relevant background
in learning how to program in a functional style. That matches my
own experience: it was only after learning how to write programs in
a functional style that I really started to appreciate the benefits
of using recursion, and to understand how to write and reason about
recursive programs.

It is also becomes much more important to show that will not cause
stack overflow.

In most cases it's enough to show that the stack depth never exceeds
log N for an input of size N. I use recursion quite routinely
without there being any significant danger of stack overflow. It's
a matter of learning which patterns are safe and which patterns are
potentially dangerous, and avoiding the dangerous patterns (unless
certain guarantees can be made to make them safe again).

The problem in this case is that max. depth of recursion is O(N) where N >>> is total number of pixels to change color. So far I didn't find an
obvious way to cut the worst case by more than small factor without
turning recursive algorithm into something that is unrecognizably
different from original and require proof of correction of its own.
Classic 'divide and conquer smaller part first" strategy does not
appear applicable here, or at least not obviously.

in reality it is less i guess..
well that would be like if i would like to recolor
vertical line of say length 2 milion pixels
- i would go always one pixel right 2 milion times

if this is 100x 100 square and i put the initioation
in middle it would go 50x right then at depth 50
it would go one up than i guess 100 times left

then just about this line up until up edge of picture
- then it probably revert back (with a lot
of false is) to first line and then go down

That's what I thought until I tried it.

If I start with an 18x18 image of all zeros, then fill starting from the centre with a 'colour' that is an incrementing value, then the final
image displayed as a table of integers looks like this:

171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
172 173 174 175 176 177 178 179 180 1 2 3 4 5 6 7 8 9
209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

By following the sequence starting from 1, you can see the fill-pattern.

It's not clear how it gets from 171 at top left to 172 half-way down the
left edge.

well its exactly what i said and is clear imo, whats not clear - simply
it was developing "try right, try left, try up, try down" untill
clasjhes with up edge then gets beck to level one depth and continues
but now until meets down edge - but fine you tested it

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Malcolm McLean on Tue Mar 19 23:23:48 2024

Malcolm McLean wrote:

On 16/03/2024 18:21, Scott Lurndal wrote:

Malcolm McLean <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

Example given. A recursive algorithm which is hard to reason about and

Perhaps hard for _you_ to reason about. That doesn't
generalize to every other programmer that might read that
code.

From experience this one blows the stack, but not always. Sometimes
it's OK to use.

Since you can reason about it so easily, you can tell the others when
you're OK and when you are not, in a handy intuitive way so that someone thinking of implementing it will know.

from "effective c" or maybe i call it "optimal" point of view
recursion as it is implementeded in c is wrong (it is sub-optimel)
so i agree with that statement - hovever a big dose of world today
programs in a non optimal way (for example whole that python ond other programing ways is non optimal)..and thus te recursion may be found one way

..and i must say whan i way younger i was much more hardfixed to optimal coding..today i almost no code at all and lost a interest (i mean
"being interested") in many things ..i also consider that
non optimal cases more interesting - and generallt this recursion would
be standable - especialy if for example windows has this "guard page"
some exception based mechanism that would allow to resize stack
up its default two megabytes (which is a bit small size as for today)
instead of crashing application

im not sure hovever if its done and if its posoble as i understand there
is exception when tryibgf to read write immediatelly after stack reserve
but not quite if someone just moves up stack pointer (like when
allocating stack space for array) but those machanisms could be handy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to David Brown on Tue Mar 19 23:52:30 2024

David Brown wrote:

rks, especially if the flood-fill is on live displayed data rather than
in a buffer off-screen. But typically you need to get a /lot/ more
advanced (i.e., not your algorithm) to improve on the OP's version by an order of magnitude, so if speed is not essential but understanding that
it is correct

this code of my was a most fast implementation when i was needed to test something in 3 minutes as the efect looks good
(i wrote an editor for low resolution drawing when i select the
given color piece then if selected by pressing control or shift
and moving mouse i recolorise this component in fluid way -

- it ios good becouse you may see which colors fits to other colors and
some editors liek paint dont allow that this to compose some image with
fitting colors you got much much harder amount of work)

btw this is seen its written as adhoc solution becouse from
optimistation point of view apssing old_color and new_color
wchich are always teh same (like passing them in whole branch
potential milion times) is nonsense - but this "branch"
(as i wouldnt call it function, its ratcher brabjc - need
that data.. and if not passing it as args i would need to make
standalone variables

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 00:03:04 2024

Michael S wrote:

On Sun, 17 Mar 2024 14:56:34 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 15:09, Malcolm McLean wrote:

On 16/03/2024 14:40, David Brown wrote:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern.
But not sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove
correct than the OP's original one, and unlikely to be very much
faster (it will certainly scale in the same way in both time and
memory usage).

Now is this David Brown being David Borwn, ot its it actaully ture?

And I need to run some tests, don't I?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int floodfill_r(unsigned char *grey, int width, int height, int x,
int y, unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
grey[y*width+x] = dest;
floodfill_r(grey, width, height, x - 1, y, target, dest);
floodfill_r(grey, width, height, x + 1, y, target, dest);
floodfill_r(grey, width, height, x, y - 1, target, dest);
floodfill_r(grey, width, height, x, y + 1, target, dest);

return 0;
}

/**
Floodfill4 - floodfill, 4 connectivity.

@param[in,out] grey - the image (formally it's greyscale but it
could be binary or indexed)
@param width - image width
@param height - image height
@param x - seed point x
@param y - seed point y
@param target - the colour to flood
@param dest - the colur to replace it by.
@returns Number of pixels flooded.
*/
int floodfill4(unsigned char *grey, int width, int height, int x, int
y, unsigned char target, unsigned char dest)
{
int *qx = 0;
int *qy = 0;
int qN = 0;
int qpos = 0;
int qcapacity = 0;
int wx, wy;
int ex, ey;
int tx, ty;
int ix;
int *temp;
int answer = 0;

if(grey[y * width + x] != target)
return 0;
qx = malloc(width * sizeof(int));
qy = malloc(width * sizeof(int));
if(qx == 0 || qy == 0)
goto error_exit;
qcapacity = width;
qx[qpos] = x;
qy[qpos] = y;
qN = 1;

while(qN != 0)
{
tx = qx[qpos];
ty = qy[qpos];
qpos++;
qN--;

if(qpos == 256)
{
memmove(qx, qx + 256, qN*sizeof(int));
memmove(qy, qy + 256, qN*sizeof(int));
qpos = 0;
}
if(grey[ty*width+tx] != target)
continue;
wx = tx;
wy = ty;
while(wx >= 0 && grey[wy*width+wx] == target)
wx--;
wx++;
ex = tx;
ey = ty;
while(ex < width && grey[ey*width+ex] == target)
ex++;
ex--;

for(ix=wx;ix<=ex;ix++)
{
grey[ty*width+ix] = dest;
answer++;
}

if(ty > 0)
for(ix=wx;ix<=ex;ix++)
{
if(grey[(ty-1)*width+ix] == target)
{
if(qpos + qN == qcapacity)
{
temp = realloc(qx, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qx = temp;
temp = realloc(qy, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qy = temp;
qcapacity += width;
}
qx[qpos+qN] = ix;
qy[qpos+qN] = ty-1;
qN++;
}
}
if(ty < height -1)
for(ix=wx;ix<=ex;ix++)
{
if(grey[(ty+1)*width+ix] == target)
{
if(qpos + qN == qcapacity)
{
temp = realloc(qx, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qx = temp;
temp = realloc(qy, (qcapacity + width) * sizeof(int));
if(temp == 0)
goto error_exit;
qy = temp;
qcapacity += width;
}
qx[qpos+qN] = ix;
qy[qpos+qN] = ty+1;
qN++;
}
}
}

free(qx);
free(qy);

return answer;
error_exit:
free(qx);
free(qy);
return -1;
}

int main(void)
{
unsigned char *image;
clock_t tick, tock;
int i;

image = malloc(100 * 100);
tick = clock();
for (i = 0 ; i < 10000; i++)
{
memset(image, 0, 100 * 100);
floodfill_r(image, 100, 100, 50, 50, 0, 1);
}
tock = clock();
printf("floodfill_r %g\n", ((double)(tock -
tick))/CLOCKS_PER_SEC);

tick = clock();
for (i = 0 ; i < 10000; i++)
{
memset(image, 0, 100 * 100);
floodfill4(image, 100, 100, 50, 50, 0, 1);
}
tock = clock();
printf("floodfill4 %g\n", ((double)(tock - tick))/CLOCKS_PER_SEC);

return 0;
}

Let's give it a whirl

malcolm@Malcolms-iMac cscratch % gcc -O3 testfloodfill.c
malcolm@Malcolms-iMac cscratch % ./a.out
floodfill_r 1.69274
floodfill4 0.336705

I find your performance measurement non-decisive for two reasons:
(1) because your test case is too trivial and probably uncharacteristic
and
(2) because recursive variant could be trivially rewritten in a way
that reduces # of stack memory accesses by factor of 2 or 3.
Like that:

struct recursive_context_t {
unsigned char *grey;
int width, height;
unsigned char target, dest;
};

static void floodfill_r_core(const struct recursive_context_t* context,
int x, int y) {
if (x < 0 || x >= context->width || y < 0 || y >= context->height)
return;
if (context->grey[y*context->width+x] == context->target) {
context->grey[y*context->width+x] = context->dest;
floodfill_r_core(context, x - 1, y);
floodfill_r_core(context, x + 1, y);
floodfill_r_core(context, x, y - 1);
floodfill_r_core(context, x, y + 1);
}
}

int floodfill_r(
unsigned char *grey,
int width, int height,
int x, int y,
unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
struct recursive_context_t context = {
.grey = grey,
.width = width,
.height = height,
.target = target,
.dest = dest,
};
floodfill_r_core(&context, x, y);
return 1;
}

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 00:30:56 2024

Michael S wrote:

On Wed, 20 Mar 2024 00:03:04 +0100
fir <[email protected]> wrote:

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

Not thred-safe.

some thread safe as previous, and i just say that thiose new_color and old_color in arguments i add for convenience - as those all was
functional test if the operation of recolorisation visibly works and how
in my lowres (bigpixel) graphics editor - its no need to pass it
down the stack..also the test if old_color = new color then return
is strictly probably not needed to populate

i also made unnecesary GetPixelSafe - i use two metods liek SetPixelSafe
just checks if x,y is in array at all ans SetPixelUnsafe
simply frame[y*frame_width+x] = color

so if you were interested in speed comparsions you wouldnt need to pass structure at all and that will be faster

i agree generally with bot mclean and brown in this discusion
1) its not optimal so its kinda wrong
2) it is simple so its kinda usable and for some uses its handy so
not all accusations of this being wring are justified

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to fir on Wed Mar 20 01:17:59 2024

On Wed, 20 Mar 2024 00:03:04 +0100
fir <[email protected]> wrote:

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

Not thred-safe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 01:13:10 2024

Michael S wrote:

On Sat, 16 Mar 2024 11:33:20 +0000
Malcolm McLean <[email protected]> wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom
for my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area
of of one color into another color (becouse if no someone would
need to do it by hand pixel by pixel and the need to change color
of given element is very common)

there is very simple method of doing it - i men i click in given
color pixel then replace it by my color and call the same function
on adjacent 4 pixels (only need check if it is in screen at all and
if the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned
old_color, unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this -
incidentally as i said i use this editor to low res graphics like
200x200 pixels or less, and it is only a toll of private use,
yet i got no time to work on it more than 1-2-3 days i guess but
still

is there maybe simple way to improve it?

>
This is a cheap and cheerful fllod fill. And it's easy to get right
and shouldn't afall over.

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through diagonal?

this is right remark..i simply not thought on it..but thiose are kinda
details i just my modify the function if i would notice i need the
diagonally conected

note how the topic was born : i was writing the editor, the simple
editor is a work of 1-2 days of code - in here the "recolorisation
of selected (by mouse click) area" is a 30 minutes try then i go further

i asked the topic here as i felt i got no time to rethink if it will
blow my progranm or not but that 30 minurtes task was for 30 minutes
not for a multi hour discusion

hovever i often like to post that some piece of coding to turn into
multi-hpour discusiion to get a bigger ground on some things then coding
become more solid

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to bart on Wed Mar 20 01:48:54 2024

bart wrote:

On 16/03/2024 04:11, fir wrote:

i was writing simple editor (something like paint but more custom for
my eventual needs) for big pixel (low resolution) drawing

it showed in a minute i need a click for changing given drawed area of
of one color into another color (becouse if no someone would need to
do it by hand pixel by pixel and the need to change color of given
element is very common)

there is very simple method of doing it - i men i click in given color
pixel then replace it by my color and call the same function on
adjacent 4 pixels (only need check if it is in screen at all and if
the color to change is that initial color

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

it work but im not quite sure how to estimate the safety of this -

On my machine, it's OK up to a 400x400 image (starting with all one
colour and filling from the centre with another colour).

At 500x500, I get stack overflow. The 400x400 the maximum recursion
depth is 80,000 calls.

i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of this recursion becouse maybe it is more fitted to be run parrallel

if yu would just 'fork' that one call on 4 parallel calls you dont get
that problem - as it then works like 'horisontal' (shallow, like in
shallow searh) not 'vertical' (in-depth, deep search)

and if someone would rewrite in on non recursion way then it would be
natural to rewrite it to work horisontal -w hich is better

if someone would fork it in really parallel then the program of sybchronistation of ram accesses appears

this observation hovewer may be seen as a strength of resursion -
as it naturally shows it works good with micro-paralelisation
(crowd of execution channels)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Peter 'Shaggy' Haywood on Wed Mar 20 01:26:34 2024

Peter 'Shaggy' Haywood wrote:

Groovy hepcat Lew Pitcher was jivin' in comp.lang.c on Mon, 18 Mar 2024
01:27 am. It's a cool scene! Dig it.

On 16/03/2024 04:11, fir wrote:

[Snip.]

int RecolorizePixelAndAdjacentOnes(int x, int y, unsigned old_color,
unsigned new_color)
{
if(old_color == new_color) return 0;

if(XYIsInScreen( x, y))
if(GetPixelUnsafe(x,y)==old_color)
{
SetPixelSafe(x,y,new_color);
RecolorizePixelAndAdjacentOnes(x+1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x-1, y, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y-1, old_color, new_color);
RecolorizePixelAndAdjacentOnes(x, y+1, old_color, new_color);
return 1;
}

return 0;
}

[Snippity doo dah.]

Take fir's example code above; a simple single call to
RecolorizePixelAndAdjacentOnes() will effectively recolour the
origin cell multiple times, because of how the recursion is handled.

No, I don't think so. You seem to have missed the fact that it checks
the colour of the "current" pixel, and only continues (setting new
colour & recursing) if it is the old colour.
Of course, I'm infering (guessing) the functionality, at least
partially (Unsafe? Safe?), of GetPixelUnsafe() and SetPixelSafe() based
on their names.

[Snip Lew's examples.]

Safe and Unsafe means that Safe checks if the x,y is in the array of
pixels, when Unsafe just writes without checking - i draw in array of
unsigned 32 bit ARGB or GBRA (never remeber) pixels - then i blit that
'bitmap' on window client size as it can be done in winapi

here are exact code

inline void SetPixelUnsafe(int x, int y, unsigned color)
{
extern int frame_size_x ;
extern int frame_size_y ;
extern unsigned int* frame_bitmap ;

frame_bitmap[y*frame_size_x+x]=color;
}

inline void SetPixelSafe(int x, int y, unsigned color)
{
// if(frame==0) ERROR_EXIT("frame is zero in setpixelsafe ");
if(x<0) return;
if(x>frame_size_x-1) return;
if(y<0) return;
if(y>frame_size_y-1) return;

frame_bitmap[y*frame_size_x+x]=color;
}

there was soem mistake in that function before as if i check already i
should be using Unsafe versions of setpixel and getpixel but i tested
this for work not for optimisation so i didnt care

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to fir on Wed Mar 20 01:36:29 2024

fir wrote:

this code of my was a most fast implementation when i was needed to test something in 3 minutes as the efect looks good

i mean probabaly 30 minutes

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Malcolm McLean on Wed Mar 20 02:15:32 2024

Malcolm McLean wrote:

On 16/03/2024 15:09, Malcolm McLean wrote:

On 16/03/2024 14:40, David Brown wrote:

On 16/03/2024 12:33, Malcolm McLean wrote:

And here's some code I wrote a while ago. Use that as a pattern. But
not sure how well it works. Haven't used it for a long time.

https://github.com/MalcolmMcLean/binaryimagelibrary/blob/master/drawbinary.c

Your implementation is a mess, /vastly/ more difficult to prove
correct than the OP's original one, and unlikely to be very much
faster (it will certainly scale in the same way in both time and
memory usage).

Now is this David Brown being David Borwn, ot its it actaully ture?

And I need to run some tests, don't I?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int floodfill_r(unsigned char *grey, int width, int height, int x, int y, unsigned char target, unsigned char dest)
{
if (x < 0 || x >= width || y < 0 || y >= height)
return 0;
if (grey[y*width+x] != target)
return 0;
grey[y*width+x] = dest;
floodfill_r(grey, width, height, x - 1, y, target, dest);
floodfill_r(grey, width, height, x + 1, y, target, dest);
floodfill_r(grey, width, height, x, y - 1, target, dest);
floodfill_r(grey, width, height, x, y + 1, target, dest);

return 0;
}

if someone would write simpler version i would write

recolorize_pixel_chain(int x, int y)
{
if(map[y][x]==color_to_replace)
{
map[y][x]=replacement_color);

recolorize_pixel_chain(x+1, y);
recolorize_pixel_chain(x-1, y);
recolorize_pixel_chain(x, y+1);
recolorize_pixel_chain(x, y-1);
}
}

but from practical coding the one with longer names is more practical
imo - but this one above is more 'presenting'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Malcolm McLean on Tue Mar 19 21:19:19 2024

Malcolm McLean <[email protected]> writes:

On 19/03/2024 05:54, Tim Rentsch wrote:

Malcolm McLean <[email protected]> writes:

On 18/03/2024 09:30, Tim Rentsch wrote:

Michael S <[email protected]> writes:

[...]

Except I don't understand why it works it all.
Can't fill area have sub-areas that only connected through
diagonal?

It is customary in raster graphics to count pixels as adjacent
only if they share an edge, not if they just share a corner.
Usually that gives better results; the exceptions tend to need
special handling anyway and not just connecting through
diagonals.

Though with a binary image, if the foreground is 4-connected, the
background must therefore be 8-connected.

It might be but it doesn't have to be.

Also different terminology should be used, since 4-connected
(also N-connected, for other integer N) has a specific meaning in
graph theory, and one very different than what is meant above.

That is the terminology in binary image processing. The pixels are 4-connected or 8-connected depending on whether a shared corner is
considered to make the group of pixels two objects or one object.

A poor choice of terminology. Side adjacent or corner and side
adjacent would be better.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Tue Mar 19 21:43:33 2024

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

No. Mine takes horizontal scan lines and extends them, then places
the pixels above and below in a queue to be considered as seeds for
the next scan line. (It's not mine, but I don't know who invented it.
It wasn't me.)

Tim, now what does it do? Essentially it's the recursive fill
algorithm but with the data only on the stack instead of the call and
the data. And todo is actually a queue rather than a stack.

Now why would it be slower? Probaby because you usually only hit a
pixel three times with mine - once below, once above, and once for
the scan line itself, while you consider it 5 times for Tim's - once
for each neighbour and once for itself. Then horizontally adjacent
pixels are more likely to be in the same cache line than vertically
adjacent pixels, so processing images in scan lines tends to be a bit
faster.

Below is a variant of recursive algorithm that is approximately as
fast as your code (1.25x faster for filling solid rectangle, 1.43x
slower for filling snake shape).
The code is a bit long, but I hope that the logic is still obvious and
there is no need to prove correctness. [...]

To me it looks like this recursive algorithm doesn't find all
pixels that need coloring in some situations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Tue Mar 19 21:40:22 2024

Michael S <[email protected]> writes:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old, Color
new ){ static const Point deltas[4] = { {1,0}, {0,1}, {-1,0},
{0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}

if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old, Color
new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] != old )
return 0; return pixels[p.x][p.y] = new, 1;
}

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.

Slower with some shapes, faster in others. In any case
the code was written for clarity of presentation, with
no attention paid to low-level performance.

Is it the same algorithm?

Sorry, the same algorithm as what? The same as Malcolm's?
Definitely not. The same as my other posting that does
not do dynamic reallocation? Yes in the sense that if the
allocated array is large enough to begin with then no
reallocations are needed.

Besides, I don't think that use of VLA in library code is a good idea.
VLA is optional in latest C standards. And incompatible with C++.

The code uses a variably modified type, not a variable length
array. Again, the choice is for clarity of presentation. If
someone wants to get rid of the variably modified types, it's
very easy to do, literally a five minute task. Anyway the
interface is poorly designed to start with, there are bigger
problems than just whether a variably modified type is used.
(I chose the interface I did to approximate the interface
used in Malcolm's code.)

If someone wants to use the functionality from C++, it's
easy enough to write a C wrapper function to do that.
IMO C++ has diverged sufficiently from C so that there
is little to be gained by trying to make code interoperable
between the two languages.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to fir on Wed Mar 20 10:41:55 2024

On Wed, 20 Mar 2024 01:13:10 +0100
fir <[email protected]> wrote:

i asked the topic here as i felt i got no time to rethink if it will
blow my progranm or not but that 30 minurtes task was for 30 minutes
not for a multi hour discusion

So you got the answer rather quickly and the answer is:
"Yes, in the worst case it can consume a lot of stack. Don't use this
simple and elegant algorithm unless you have full control both on size
of the images and on size of the stack and on size of the stack frame
generates by compiler for each recursive call."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to fir on Wed Mar 20 09:39:41 2024

fir wrote:

inline void RecolorizePixelAndSpawnNewPixelArea(int x, int y)
{

as i use word area in doble mining here it should be renamed like

inline void RecolorizePixelAndSpawnNewPixelImmediateVicinity(int x, int y)

(generally i use almost such log function names in my codes but not
write comments at all
than as everything is then self commenting imo..with variable names i
use shorter, but sometimes it seem that some can also be a bit longel
like here list_of_pixels_bot etc)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Wed Mar 20 09:29:54 2024

On 19/03/2024 18:16, bart wrote:

On 19/03/2024 15:05, fir wrote:

if this is 100x 100 square and i put the initioation
in middle it would go 50x right then at depth 50
it would go one up than i guess 100 times left

then just about this line up until up edge of picture
- then it probably revert back (with a lot
of false is) to first line and then go down

That's what I thought until I tried it.

If I start with an 18x18 image of all zeros, then fill starting from the centre with a 'colour' that is an incrementing value, then the final
image displayed as a table of integers looks like this:

171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
172 173 174 175 176 177 178 179 180 1 2 3 4 5 6 7 8 9
209 210 211 212 213 214 215 216 181 182 183 184 185 186 187 188 189 190
208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235
253 254 255 325 257 258 259 260 261 262 263 264 265 266 267 268 269 270
288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307

By following the sequence starting from 1, you can see the fill-pattern.

It's not clear how it gets from 171 at top left to 172 half-way down the
left edge.

After the sequence hits the end at 171, it backtracks down the numbers.
27 is the first it reaches where there is a zero square neighbour, so it
goes down from there - and the next number in the sequence is 172. Then
it is free to move to the right again (then down after moving right is
blocked at 180).

I think your posts here gives a very nice and clear way to view the
working of the algorithm. Thanks for doing that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 09:27:47 2024

Michael S wrote:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

On 19/03/2024 11:18, Michael S wrote:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
{-1,0}, {0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y
}; if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}

if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
old ) return 0; return pixels[p.x][p.y] = new, 1;
}

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.
Is it the same algorithm?

No. Mine takes horizontal scan lines and extends them, then places
the pixels above and below in a queue to be considered as seeds for
the next scan line. (It's not mine, but I don't know who invented it.
It wasn't me.)

Tim, now what does it do? Essentially it's the recursive fill
algorithm but with the data only on the stack instead of the call and
the data. And todo is actually a queue rather than a stack.

Now why would it be slower? Probaby because you usually only hit a
pixel three times with mine - once below, once above, and once for
the scan line itself, whilst you consider it 5 times for Tim's - once
for each neighbour and once for itself. Then horizontally adjacent
pixels are more likely to be in the same cache line than vertically
adjacent pixels, so processing images in scan lines tends to be a bit
faster.

I did a little more investigation gradually modifying Tim's code for
improved performance without changing the basic principle of the
algorithm. Yes, micro-optimization. Yes, I said earlier that doing so
in c.l.c it is bad sportsmanship. So what? I never claimed to be an
ideal sportsman.
The point is that after optimizations it's actually faster than the
best implementations of original recursive algorithm, including implementation that uses explicit stack and is quite economical in its
memory consumption. Tim's algorithm is 8 times less economical (8 bytes
per saved node vs 1 byte in explicit stack) and nevertheless almost
twice faster for both shapes that I was testing.
So far, this algorithm is fastest among all "local" algorithms that I
tried. By "local" I mean algorithms that don't try to recolor more than
one pixel at time.
"Non-local" algorithms i.e. yours and my recursive algorithm that
recolors St. George cross, are somewhat faster, but I suspect that
it's because all shapes that I use for testing have either long
columns or long rows or both.
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost nothing
else.
The second nice thing is that it is easy to understand. Not as easy as original recursive method, but easier than the rest of them.

If you or somebody else is interested, here is [micro]optimized variant:

#include <stdlib.h>
#include <stddef.h>
#include <string.h>

typedef unsigned char Color;
typedef int UI;
typedef struct { UI x, y; } Point;

static inline
Point* circularIncr(Point* p, Point* beg, Point* end) {
return p + 1 == end ? beg : p + 1;
}

static inline
Point mk_point(int x, int y) {
Point pt={x,y};
return pt;
}

int floodfill_r(
Color *pixels,
int w, int h,
int pt0_x, int pt0_y,
Color old, Color new)
{
if (pt0_x < 0 || pt0_x >= w || pt0_y < 0 || pt0_y >= h)
return 0;

if (pixels[pt0_y*w+pt0_x] != old)
return 0;

pixels[pt0_y*w+pt0_x] = new;

const ptrdiff_t INITIAL_TODO_SIZE = 125;
Point *todo = malloc( (INITIAL_TODO_SIZE+3) * sizeof *todo );
// +3 is extra size to assist wrap-around of wr
if (!todo)
return -1;
Point* todo_end = &todo[INITIAL_TODO_SIZE];

todo[0] = mk_point(pt0_x, pt0_y);
Point* wr = &todo[1];
Point* rd = todo;
ptrdiff_t free_space = INITIAL_TODO_SIZE - 1;
do {
Point pt = *rd;
rd = circularIncr(rd, todo, todo_end);
Point* prev_wr = wr;
if (pt.x > 0 && pixels[pt.y*w+pt.x-1] == old) {
pixels[pt.y*w+pt.x-1] = new;
*wr++ = mk_point(pt.x-1, pt.y);
}
if (pt.y > 0 && pixels[pt.y*w+pt.x-w] == old) {
pixels[pt.y*w+pt.x-w] = new;
*wr++ = mk_point(pt.x, pt.y-1);
}
if (pt.x+1 < w && pixels[pt.y*w+pt.x+1] == old) {
pixels[pt.y*w+pt.x+1] = new;
*wr++ = mk_point(pt.x+1, pt.y);
}
if (pt.y+1 < h && pixels[pt.y*w+pt.x+w] == old) {
pixels[pt.y*w+pt.x+w] = new;
*wr++ = mk_point(pt.x, pt.y+1);
}

free_space += 1 - (wr - prev_wr);
if (wr >= todo_end) {
memcpy(todo, todo_end, (wr - todo_end)*sizeof(*wr));
wr += todo - todo_end;
}

if (free_space < 4) {
ptrdiff_t rdi = rd-todo;
ptrdiff_t wri = wr-todo;
ptrdiff_t sz = todo_end - todo;
ptrdiff_t incr = sz/4;
Point* new_todo = realloc(todo, (sz+incr+3) * sizeof *todo );
// +3 is extra size to assist wrap-around of wr
if(!new_todo) {
free(todo);
return -1;
}
free_space += incr;
rd = &new_todo[rdi];
wr = &new_todo[wri];
todo = new_todo;
todo_end = &todo[sz+incr];
if (rd >= wr) {
memmove(&rd[incr], rd, (sz-rdi) * sizeof *todo );
rd = &rd[incr];
}
}
} while (rd != wr);

free( todo );
return 1;
}

if i would write it non recursive it probably would be something like that

[1] (if in static table based simpler way, generally
in last years i prefer using reallock based resizable
ones so i would need yet revrite)
[2] not tested but it is draft of that code as i would attempt to write it (come like short names so would change "list_of_pixels"
into "pixels" etc)

const int list_of_pixels_max = 10*1000*1000;
strauct {int x, y;} list_of_pixels[list_of_pixels_max];

int list_of_pixels_top = 0;
int list_of_pixels_bot = 0; //pointer to element to consume

inline void AddPixelToList(int x, int y)
{
list_of_pixels[list_of_pixels_top].x = x;
list_of_pixels[list_of_pixels_top].y = y;
list_of_pixels_top++;

// if(list_of_pixels_top>=list_of_pixels_max) ERROR_EXIT("overflow in
list of pixels")

}

int color_to_replace = 0;
int replacing_color = 0;

inline void RecolorizePixelAndSpawnNewPixelArea(int x, int y)
{
if(!IsInFrame(x,y)) return;

int color_here = GetPixelUnsafe(x,y);
if(color_here==color_to_replace)
{
StePixelUnsafe(x,y, replacement_color);
AddPixelToList( x+1, y);
AddPixelToList( x-1, y);
AddPixelToList( x, y+1);
AddPixelToList( x, y-1);
}
}

void RecolorizePixelArea(int x, int y, int color_to_replace_, int replacing_color_)
{
color_to_replace = color_to_replace_;
replacing_color = replacing_color_;

list_of_pixels_top = 0;
list_of_pixels_bot = 0;

RecolorizePixelAndSpawnNewPixelArea(x,y);

while(list_of_pixels_bot<list_of_pixels_top)
{

RecolorizePixelAndSpawnNewPixelArea(list_of_pixels[list_of_pixels_bot].x,list_of_pixels[list_of_pixels_bot].y);
list_of_pixels_bot++;
}

}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Wed Mar 20 10:56:47 2024

On Tue, 19 Mar 2024 21:43:33 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

No. Mine takes horizontal scan lines and extends them, then places
the pixels above and below in a queue to be considered as seeds for
the next scan line. (It's not mine, but I don't know who invented
it. It wasn't me.)

Tim, now what does it do? Essentially it's the recursive fill
algorithm but with the data only on the stack instead of the call
and the data. And todo is actually a queue rather than a stack.

Now why would it be slower? Probaby because you usually only hit a
pixel three times with mine - once below, once above, and once for
the scan line itself, while you consider it 5 times for Tim's -
once for each neighbour and once for itself. Then horizontally
adjacent pixels are more likely to be in the same cache line than
vertically adjacent pixels, so processing images in scan lines
tends to be a bit faster.

Below is a variant of recursive algorithm that is approximately as
fast as your code (1.25x faster for filling solid rectangle, 1.43x
slower for filling snake shape).
The code is a bit long, but I hope that the logic is still obvious
and there is no need to prove correctness. [...]

To me it looks like this recursive algorithm doesn't find all
pixels that need coloring in some situations.

Yesterday night I had few doubts myself, but after further thinking
came to conclusion that it it works in all situations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to fir on Wed Mar 20 09:51:02 2024

fir wrote:

RecolorizePixelAndSpawnNewPixelArea(x,y);

while(list_of_pixels_bot<list_of_pixels_top)
{

RecolorizePixelAndSpawnNewPixelArea(list_of_pixels[list_of_pixels_bot].x,list_of_pixels[list_of_pixels_bot].y);

list_of_pixels_bot++;
}

}

maybe this is an example os case when do {} while() would work better
than while,

do { RecolorizePixelAndSpawnNewPixelImmediateVicinity(list_of_pixels[list_of_pixels_bot].x,list_of_pixels[list_of_pixels_bot].y);

} while(list_of_pixels_bot<list_of_pixels_top)

but that would be need to check as such loops are confusing

if so i think it is somewhat general sheme

do {something();} while(bot<top)

of rewriting recursion probably

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to fir on Wed Mar 20 12:06:04 2024

On Wed, 20 Mar 2024 00:30:56 +0100
fir <[email protected]> wrote:

Michael S wrote:

On Wed, 20 Mar 2024 00:03:04 +0100
fir <[email protected]> wrote:

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

Not thred-safe.

some thread safe as previous,

The same as your previous.
But I was modifying Malcolm's recursive variant rather than yours.
Malcolm's was thread-safe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Wed Mar 20 11:54:16 2024

On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.

typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
{-1,0}, {0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );

if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;

while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;

while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}

if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}

free( todo );
}

_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
old ) return 0; return pixels[p.x][p.y] = new, 1;
}

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.

Slower with some shapes, faster in others.

In my small test suit I found no cases where this specific code is
measurably faster than code of Malcolm.
I did find one case in which they are approximately equal. I call it
"slalom shape" and it's more or less designed to be the worst case for algorithms that are trying to speed themselves by take advantage of
straight lines.
The slalom shape is generated by following code:

static
void make_slalom(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c)
{
const int n_col = width/3;
const int n_row = (height-3)/4;

// top row
// P B B P P P
for (int col = 0; col < n_col; ++col) {
unsigned char c = (col & 1)==0 ? background_c : pen_c;
image[col*3] = pen_c; image[col*3+1] = c; image[col*3+2] = c;
}

// main image: consists of 3x4 blocks filled by following pattern
// P B B
// P P B
// B P B
// P P B
for (int row = 0; row < n_row; ++row) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[(row*4+1)*width+col*3];
p[0] = pen_c; p[1] = background_c; p[2] = background_c;
p += width;
p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
p += width;
p[0] = background_c; p[1] = pen_c; p[2] = background_c;
p += width;
p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
}
}

// near-bottom rows
// P B B
for (int y = n_row*4+1; y < height-1; ++y) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[y*width+col*3];
p[0] = pen_c; p[1] = background_c; p[2] = background_c;
}
}

// bottom row - all P
memset(&image[(height-1)*width], pen_c, width);

// rightmost columns
for (int x = n_col*3; x < width; ++x) {
for (int y = 0; y < height-1; ++y)
image[y*width+x] = background_c;
}
}

In any case
the code was written for clarity of presentation, with
no attention paid to low-level performance.

Yes, your code is easy to understand. Could have been easier still if persistent indices had more meaningful names.
In other post I showed optimized variant of your algorithm:
- 4-neighbors loop unrolled. Majority of the speed up come not from
unrolling itself, but from specialization of in-rectangle check
enabled by unroll.
- Todo queue implemented as circular buffer.
- Initial size of queue increased.
This optimized variant is more competitive with 'line-grabby'
algorithms in filling solid shapes and faster than them in 'slalom'
case.

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.

Is it the same algorithm?

Sorry, the same algorithm as what? The same as Malcolm's?

Yes, that what I meant.
Still didn't find guts to try to understand what Malcolm's code is
doing.

Definitely not. The same as my other posting that does
not do dynamic reallocation? Yes in the sense that if the
allocated array is large enough to begin with then no
reallocations are needed.

Besides, I don't think that use of VLA in library code is a good
idea. VLA is optional in latest C standards. And incompatible with
C++.

The code uses a variably modified type, not a variable length
array.

I am not sufficiently versed in C Standard terminology to see a
difference.
Aren't they both introduced in C99 and made optional in later standards?

Again, the choice is for clarity of presentation. If
someone wants to get rid of the variably modified types, it's
very easy to do, literally a five minute task.

Yes, that's what it took for me.
But I knew that variably modified types exist, even if I didn't know
that they are called such.
OTOH, many (majority?) of C programmers never heard about them.

Anyway the
interface is poorly designed to start with, there are bigger
problems than just whether a variably modified type is used.
(I chose the interface I did to approximate the interface
used in Malcolm's code.)

That's true.
The biggest problem of Malcolm's interface is that logical width of the
image is the same as physical width (a.k.a. line pitch, in LAPACK
it is called the first dimension). These parameters should be separate.

If someone wants to use the functionality from C++, it's
easy enough to write a C wrapper function to do that.
IMO C++ has diverged sufficiently from C so that there
is little to be gained by trying to make code interoperable
between the two languages.

From the practical perspective, the biggest obstacle is that your code
can't be compiled with popular Microsoft compilers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Michael S on Wed Mar 20 10:23:45 2024

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

...

Tim Rentsch <[email protected]> writes:

...

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

Besides, I don't think that use of VLA in library code is a good
idea. VLA is optional in latest C standards. And incompatible with
C++.

The code uses a variably modified type, not a variable length
array.

I am not sufficiently versed in C Standard terminology to see a
difference.

A VLA is a declared object -- an array with a size that is not a
compile-time constant. A variably modified type is just a type, not an
object. Obviously one can use such a type to declare a VLA, but when it
is the type of a function parameter, there need be no declared object
with that type. Usually the associated function argument will have been dynamically allocated.

Aren't they both introduced in C99 and made optional in later
standards?

I think so but that's a shame since VMTs are very helpful for writing
array code. They avoid the need to keep calculating the index with multiplications.

Making both optional was a classic case of throwing the baby out with
the bath water. Few of the objections raised about VLAs apply to VMTs.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 13:44:04 2024

Michael S wrote:

On Wed, 20 Mar 2024 00:30:56 +0100
fir <[email protected]> wrote:

Michael S wrote:

On Wed, 20 Mar 2024 00:03:04 +0100
fir <[email protected]> wrote:

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

Not thred-safe.

some thread safe as previous,

The same as your previous.
But I was modifying Malcolm's recursive variant rather than yours.
Malcolm's was thread-safe.

sure, i dont always read into peoples code.. i just wanted to say it
seems you pass this context structure or pointer to it down the stack
each call - it is not necessary

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 06:51:20 2024

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 21:43:33 -0700
Tim Rentsch <[email protected]> wrote:

[...]

To me it looks like this recursive algorithm doesn't find all
pixels that need coloring in some situations.

Yesterday night I had few doubts myself, but after further thinking
came to conclusion that it it works in all situations.

Sorry, my bad. I did some experiments to convince myself
the algorithm sometimes doesn't work, but it turns out the
results showed a problem in the experiments rather than
the algorithm. :/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to fir on Wed Mar 20 15:44:38 2024

On Wed, 20 Mar 2024 13:44:04 +0100
fir <[email protected]> wrote:

Michael S wrote:

On Wed, 20 Mar 2024 00:30:56 +0100
fir <[email protected]> wrote:

Michael S wrote:

On Wed, 20 Mar 2024 00:03:04 +0100
fir <[email protected]> wrote:

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

Not thred-safe.

some thread safe as previous,

The same as your previous.
But I was modifying Malcolm's recursive variant rather than yours. Malcolm's was thread-safe.

sure, i dont always read into peoples code.. i just wanted to say it
seems you pass this context structure or pointer to it

Yes, pointer. That's the whole point of my modification of
Malcolm's code - to copy one pointer instead of 5 variables that
are never changed.

down the stack
each call - it is not necessary

Not necessary if you don't want it thread-safe. Necessary otherwise.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 15:20:37 2024

Michael S wrote:

On Wed, 20 Mar 2024 01:13:10 +0100
fir <[email protected]> wrote:

i asked the topic here as i felt i got no time to rethink if it will
blow my progranm or not but that 30 minurtes task was for 30 minutes
not for a multi hour discusion

So you got the answer rather quickly and the answer is:
"Yes, in the worst case it can consume a lot of stack. Don't use this
simple and elegant algorithm unless you have full control both on size
of the images and on size of the stack and on size of the stack frame generates by compiler for each recursive call."

ye, may conclusion would here be rather

put stack to 100 or even 150 MB and forget... then worry if the code
(of recolorisation) work too slow

(i know howewer this is potential bug af is someone would want then
recolorise of very big area there still would be stack overflow, but
this is unlikely)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Michael S on Wed Mar 20 15:17:32 2024

Michael S wrote:

On Wed, 20 Mar 2024 13:44:04 +0100
fir <[email protected]> wrote:

Michael S wrote:

On Wed, 20 Mar 2024 00:30:56 +0100
fir <[email protected]> wrote:

Michael S wrote:

On Wed, 20 Mar 2024 00:03:04 +0100
fir <[email protected]> wrote:

im not quite sure what you do here.. pass the structure? in fact
the thing you name context you may not pass at all just make is
standalone static variables becouse they/it is the same for whole
"branch" (given recursive branch of recolorisation)

something like

int old_color = 0xff0000;
int new_color = 0x00ff00;

void RecolorizePixelAndAdjacentPixels(int x, int y)
{
//...
}

Not thred-safe.

some thread safe as previous,

The same as your previous.
But I was modifying Malcolm's recursive variant rather than yours.
Malcolm's was thread-safe.

sure, i dont always read into peoples code.. i just wanted to say it
seems you pass this context structure or pointer to it

Yes, pointer. That's the whole point of my modification of
Malcolm's code - to copy one pointer instead of 5 variables that
are never changed.

down the stack
each call - it is not necessary

Not necessary if you don't want it thread-safe. Necessary otherwise.

okay, if you say so, i dont use threads as i dont like them so i dont
know (and dont want to know) ;c

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 07:27:38 2024

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

On 19/03/2024 11:18, Michael S wrote:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

Here is the refinement that uses a resizing rather than
fixed-size buffer.

[...]

I did a little more investigation gradually modifying Tim's code
for improved performance without changing the basic principle of
the algorithm. [...]

I appreciate your doing this. I developed independently a
couple of versions along similar lines.

So far, this algorithm is fastest among all "local" algorithms
that I tried. By "local" I mean algorithms that don't try to
recolor more than one pixel at time.

"Non-local" algorithms i.e. yours and my recursive algorithm that
recolors St. George cross, are somewhat faster, [...].

I was confused by this statement at first but now I see that
"yours" refers to Malcolm's algorithm.

The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.

One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.

The second nice thing is that it is easy to understand. Not as
easy as original recursive method, but easier than the rest of
them.

If you or somebody else is interested, here is [micro]optimized
variant: [...]

Good show. I will try to get my latest version posted soon.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Ben Bacarisse on Wed Mar 20 07:52:12 2024

Ben Bacarisse <[email protected]> writes:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

...

Tim Rentsch <[email protected]> writes:

...

static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );

Besides, I don't think that use of VLA in library code is a
good idea. VLA is optional in latest C standards. And
incompatible with C++.

The code uses a variably modified type, not a variable length
array.

I am not sufficiently versed in C Standard terminology to see a
difference.

A VLA is a declared object -- an array with a size that is not a
compile-time constant. A variably modified type is just a type,
not an object. Obviously one can use such a type to declare a
VLA, but when it is the type of a function parameter, there need
be no declared object with that type. Usually the associated
function argument will have been dynamically allocated.

Also ordinary local variables can be declared to have a variably
modified type (the type not necessarily having been introduced
separately), often a benefit for code that is dealing with
multi-dimensional arrays.

Aren't they both introduced in C99 and made optional in later
standards?

I think so but that's a shame since VMTs are very helpful for
writing array code. They avoid the need to keep calculating the
index with multiplications.

C11 added a pre-defined preprocessor macro __STDC_NO_VLA__, which implementations can define to be 1 "intended to indicate that the implementation does not support variable length arrays or variably
modified types." It's amusing to note that an implementation can
support VLAs and VMTs but still define the macro if they are not
intended to be supported. ;)

Making both optional was a classic case of throwing the baby out
with the bath water. Few of the objections raised about VLAs
apply to VMTs.

Agree 100%.

Someone who wants to take a stand on this issue might to consider
adding the following lines

#if __STDC_NO_VLA__
#error Substandard implementation detected
#endif

at various places around their source code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 10:01:10 2024

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.
[code]

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.

Slower with some shapes, faster in others.

In my small test suit I found no cases where this specific code is
measurably faster than code of Malcolm.

My test cases include pixel fields of 32k by 32k, with for
example filling the entire field starting at the center point.
Kind of a stress test but it turned up some interesting results.

I did find one case in which they are approximately equal. I call
it "slalom shape" and it's more or less designed to be the worst
case for algorithms that are trying to speed themselves by take
advantage of straight lines.
The slalom shape is generated by following code:
[code]

Thanks, I may try that.

In any case
the code was written for clarity of presentation, with
no attention paid to low-level performance.

Yes, your code is easy to understand. Could have been easier
still if persistent indices had more meaningful names.

I have a different view on that question. However I take your
point.

In other post I showed optimized variant of your algorithm: -
4-neighbors loop unrolled. Majority of the speed up come not from
unrolling itself, but from specialization of in-rectangle check
enabled by unroll.
- Todo queue implemented as circular buffer.
- Initial size of queue increased.
This optimized variant is more competitive with 'line-grabby'
algorithms in filling solid shapes and faster than them in
'slalom' case.

Yes, unrolling is an obvious improvement. I deliberately chose a
simple (and non-optimized) method to make it easier to see how it
works. Simple optimizations are left as an exercise for the
reader. :)

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

Besides, I don't think that use of VLA in library code is a good
idea. VLA is optional in latest C standards. And incompatible
with C++.

The code uses a variably modified type, not a variable length
array.

I am not sufficiently versed in C Standard terminology to see a
difference.
Aren't they both introduced in C99 and made optional in later
standards?

Ben explained the difference. I posted a short followup to his
explanation. And yes, as of C11 VLAs and VMTs are both optional
(it would be nice if a new C standard put back the requirement
of variably modified types).

Again, the choice is for clarity of presentation. If
someone wants to get rid of the variably modified types, it's
very easy to do, literally a five minute task.

Yes, that's what it took for me.
But I knew that variably modified types exist, even if I didn't know
that they are called such.
OTOH, many (majority?) of C programmers never heard about them.

Something that surprised me is that some C programmers don't
know what compound literals are, even though they have been
around more than 20 years. I'm not inclined to try to cater
to people who program in C but aren't at least aware of what
was done more than 20 years ago.

Anyway the interface is poorly designed to start with, [...]

That's true. [...]

Yes! Hoo rah!

If someone wants to use the functionality from C++, it's
easy enough to write a C wrapper function to do that.
IMO C++ has diverged sufficiently from C so that there
is little to be gained by trying to make code interoperable
between the two languages.

From the practical perspective, the biggest obstacle is that your
code can't be compiled with popular Microsoft compilers.

Some people might consider that a plus rather than a minus. ;)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed Mar 20 10:26:58 2024

Michael S <[email protected]> writes:

[...]

I did a little more investigation gradually modifying Tim's code
for improved performance without changing the basic principle of
the algorithm. [...]

Here is a rendition of my latest and fastest refinement.

#include <stdlib.h>

typedef unsigned char UC;
typedef unsigned UI;
typedef unsigned U32;
typedef unsigned long long U64;
typedef struct { UI x, y; } Point;

void
faster_fill( UI w0, UI h0, UC pixels[], Point p0, UC old, UC new ){
U64 const w = w0;
U64 const h = h0;
U64 const xm = w-1;
U64 const ym = h-1;

U64 j = 0;
U64 k = 0;
U64 n = 1u << 10;
U64 m = n-1;
U32 *todo = malloc( n * sizeof *todo );
U64 x = p0.x;
U64 y = p0.y;

if( !todo || x >= w || y >= h || pixels[ x*h+y ] != old ) return;

todo[ k++ ] = x<<16 | y;

while( j != k ){
U64 used = j < k ? k-j : k+n-j;
U64 open = n - used;
if( open < used / 16 ){
U64 new_n = n*2;
U64 new_m = new_n-1;
U64 new_j = j < k ? j : j+n;
U32 *t = realloc( todo, new_n * sizeof *t );
if( ! t ) break;
if( j != new_j ) memcpy( t+new_j, t+j, (n-j) * sizeof *t );
todo = t, n = new_n, m = new_m, j = new_j, open = n-used;
}

U64 const jx = used <= 3*open ? k : j+open/3 &m;
while( j != jx ){
UI p = todo[j]; j = j+1 &m;
x = p >> 16, y = p & 0xFFFF;
if( x > 0 && pixels[ x*h-h + y ] == old ){
todo[k] = x-1<<16 | y, k = k+1&m, pixels[ x*h-h +y ] = new;
}
if( y > 0 && pixels[ x*h + y-1 ] == old ){
todo[k] = x<<16 | y-1, k = k+1&m, pixels[ x*h +y-1 ] = new;
}
if( x < xm && pixels[ x*h+h + y ] == old ){
todo[k] = x+1<<16 | y, k = k+1&m, pixels[ x*h+h +y ] = new;
}
if( y < ym && pixels[ x*h + y+1 ] == old ){
todo[k] = x<<16 | y+1, k = k+1&m, pixels[ x*h +y+1 ] = new;
}
}
}

free( todo );
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Thu Mar 21 15:36:45 2024

On Wed, 20 Mar 2024 10:26:58 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

I did a little more investigation gradually modifying Tim's code
for improved performance without changing the basic principle of
the algorithm. [...]

Here is a rendition of my latest and fastest refinement.

WOW, you really opened up your bag of tricks!
Power-of-two sized circular buffers is something that I tend to use on
smaller systems, like DSPs or MCUs rather than on "big" computers. But,
of course, on "big" computers it also helps.
Packing {x,y} into 32-bit word is a bit dirty. I'd guess that we can
justify it by claiming that original code although has similar
limitation of width*height <= INT_MAX.
Removal of FIFO empty and almost-full tests in the inner loop helps
solid shapes, but appears to slow down "drawn" shapes. Since solid
shapes are the slowest to fill, it is probably a good trade-off.

Overall, it is faster than my implementation of your algorithm. Esp. so
for solid shapes. Esp. of esp. so on Intel Skylake CPUs where speed up
is up to 1.75x.

More complicated 'St. George Cross' algorithms are still faster for
solid shapes and for shapes dominated by long horizontal or long
vertical lines. But they are ... well ... more complicated.
And [on Skylake] their worst case ('slalom' shape) is somewhat slower in absolute sense than the worst case of your code (a solid bar).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Mar 21 09:47:15 2024

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 10:26:58 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

I did a little more investigation gradually modifying Tim's code
for improved performance without changing the basic principle of
the algorithm. [...]

Here is a rendition of my latest and fastest refinement.

WOW, you really opened up your bag of tricks!

That I did, that I did. :)

I can do this kind of stuff when I need to. Usually I don't need
to.

Power-of-two sized circular buffers is something that I tend to
use on smaller systems, like DSPs or MCUs rather than on "big"
computers. But, of course, on "big" computers it also helps.

Bitwise '&' is simply faster than '%'. Also, bitwise '&' works
on unsigned types in the event that there is wraparound, but '%'
probably doesn't.

Packing {x,y} into 32-bit word is a bit dirty. I'd guess that we
can justify it by claiming that original code although has similar
limitation of width*height <= INT_MAX.

Yes, it is a bit dirty. In practice pixel fields almost never get
above 16 bits in each direction, and the code is easy enough to
change (by putting two 32-bit quantities into a 64-bit type) if
it becomes necessary to accommodate an enormous pixel field.

Removal of FIFO empty and almost-full tests in the inner loop helps
solid shapes, but appears to slow down "drawn" shapes. Since solid
shapes are the slowest to fill, it is probably a good trade-off.

Taking those tests out of the inner loop helps when there is big
frontier set, because the tests don't have to be done as often.
When the frontier set is small, as we would expect for long
skinny shapes, doing that doesn't help as much (and of course
other overhead may make it worse in such cases).

Overall, it is faster than my implementation of your algorithm.
Esp. so for solid shapes. Esp. of esp. so on Intel Skylake CPUs
where speed up is up to 1.75x.

More complicated 'St. George Cross' algorithms are still faster
for solid shapes and for shapes dominated by long horizontal or
long vertical lines. But they are ... well ... more complicated.
And [on Skylake] their worst case ('slalom' shape) is somewhat
slower in absolute sense than the worst case of your code (a solid
bar).

I played around with a "non-local" (in your terminology) version
of my most recently posted code, and discovered some things.
First the non-local version is somewhat faster on some shapes,
but noticeably slower on others. The non-local version is more
sensitive to which starting point is chosen. In a way it looks
similar to what happens with compression algorithms - some cases
get better, others get decidedly worse. I didn't do a lot of
experiments in an effort to determine what the range or relative
proportions of the different behaviors are.

After thinking about it a bit, it seems to me that a local-only
method is "more queue-like" and a non-local method is "more
stack-like". Using a pure queue plods along very predictably,
never getting much better or much worse. Being more stack-like
sometimes gets a speedup, but sometimes stumbles into the pit of
despair, which in the worse case needs a lot of memory and does
more memory shuffling. So for a general algorithm I'd opt for a
local-only method. Later on it might be good to use a more
tailored algorithm for special cases, if we can identify which
cases are special in a way that isn't too expensive.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter 'Shaggy' Haywood@21:1/5 to All on Fri Mar 22 13:04:39 2024

Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
am. It's a cool scene! Dig it.

i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of this recursion becouse maybe it is more fitted to be run parrallel

I wasn't going to post this here, since it's really an algorithm
issue, rather than a C issue. But the thread seems to have gone on for
some time with you seeming to be unable to solve this. So I'll give you
this as a clue.
The (or, at least, a) solution is only partially recursive. What I
have used is a line-based algorithm, each line being filled iteratively
(in a simple loop) from left to right. Recursion from line to line
completes the algorithm. Thus, the recursion level is greatly reduced.
And you should find that this approach fills an area of any shape.
Note, however, that for some pathological cases (very large and
complex shapes), this can still create a fairly large level of
recursion. Maybe a more complex approach can deal with this. What I
present here is just a very simple one which, in most cases, should
have a level of recursion well within reason.
I use a two part approach. The first part (called floodfill in the
code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
only called by floodfill(). It goes something like this (although this
is incomplete, untested and not code I've actually used, just an
example):

static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
{
unsigned start, end;

/* Find start and end of line within floodfill area. */
start = end = x;
while(old_clr == get_pixel(y, start - 1))
--start;
while(old_clr == get_pixel(y, end + 1))
++end;

/* Fill line with new colour. */
for(x = start; x <= end; x++)
set_pixel(y, x, new_clr);

/* Run along again, checking pixel colours above and below,
and recursing if appropriate. */
for(x = start; x <= end; x++)
{
if(old_clr == get_pixel(y - 1, x))
r_floodfill(y - 1, x, new_clr, old_clr);
if(old_clr == get_pixel(y + 1, x))
r_floodfill(y + 1, x, new_clr, old_clr);
}
}

void floodfill(unsigned y, unsigned x, pixel_t new_clr)
{
pixel_t old_clr = get_pixel(y, x);

/* Only proceed if colours differ. */
if(new_clr != old_clr)
r_floodfill(y, x, new_clr, old_clr);
}

To use this, simply call floodfill() passing the coordinates of the
starting point for the fill (y and x) and the fill colour (new_clr).

--

----- Dig the NEW and IMPROVED news sig!! -----

-------------- Shaggy was here! ---------------
Ain't I'm a dawg!!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Peter 'Shaggy' Haywood on Fri Mar 22 17:55:26 2024

On Fri, 22 Mar 2024 13:04:39 +1100
Peter 'Shaggy' Haywood <[email protected]> wrote:

Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
am. It's a cool scene! Dig it.

i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of
this recursion becouse maybe it is more fitted to be run parrallel

I wasn't going to post this here, since it's really an algorithm
issue, rather than a C issue. But the thread seems to have gone on for
some time with you seeming to be unable to solve this. So I'll give
you this as a clue.
The (or, at least, a) solution is only partially recursive. What I
have used is a line-based algorithm, each line being filled
iteratively (in a simple loop) from left to right. Recursion from
line to line completes the algorithm. Thus, the recursion level is
greatly reduced. And you should find that this approach fills an area
of any shape. Note, however, that for some pathological cases (very
large and complex shapes), this can still create a fairly large level
of recursion. Maybe a more complex approach can deal with this. What I present here is just a very simple one which, in most cases, should
have a level of recursion well within reason.
I use a two part approach. The first part (called floodfill in the
code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but
is only called by floodfill(). It goes something like this (although
this is incomplete, untested and not code I've actually used, just an example):

static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr,
pixel_t old_clr)
{
unsigned start, end;

/* Find start and end of line within floodfill area. */
start = end = x;
while(old_clr == get_pixel(y, start - 1))
--start;
while(old_clr == get_pixel(y, end + 1))
++end;

/* Fill line with new colour. */
for(x = start; x <= end; x++)
set_pixel(y, x, new_clr);

/* Run along again, checking pixel colours above and below,
and recursing if appropriate. */
for(x = start; x <= end; x++)
{
if(old_clr == get_pixel(y - 1, x))
r_floodfill(y - 1, x, new_clr, old_clr);
if(old_clr == get_pixel(y + 1, x))
r_floodfill(y + 1, x, new_clr, old_clr);
}
}

void floodfill(unsigned y, unsigned x, pixel_t new_clr)
{
pixel_t old_clr = get_pixel(y, x);

/* Only proceed if colours differ. */
if(new_clr != old_clr)
r_floodfill(y, x, new_clr, old_clr);
}

To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).

It looks like anisotropic variant of my St. George Cross algorithm.
Or like recursive variant of Malcolm's algorithm.
Being anisotropic, it has higher amount of glass jaws. In particular,
it would be very slow for not uncommon 'jail window' patterns.
* *** *** *** ***
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
*** *** *** *** *

Also, implementation is still recursive and the worst-case recursion
depth is still O(N), where N is total number of recolored pixels, so
unlike many other solutions presented here, you didn't solve fir's
original problem.
And in presented form it's not thread-safe. Which is not a problem for
fir, but nonn-desirable for the rest of us.

Conclusion: sorry, you aren't going to get a cookie for your effort.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Fri Mar 22 18:31:16 2024

On Fri, 22 Mar 2024 17:55:26 +0300
Michael S <[email protected]> wrote:

On Fri, 22 Mar 2024 13:04:39 +1100
Peter 'Shaggy' Haywood <[email protected]> wrote:

To use this, simply call floodfill() passing the coordinates of
the starting point for the fill (y and x) and the fill colour
(new_clr).

It looks like anisotropic variant of my St. George Cross algorithm.
Or like recursive variant of Malcolm's algorithm.
Being anisotropic, it has higher amount of glass jaws. In particular,
it would be very slow for not uncommon 'jail window' patterns.
* *** *** *** ***
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
*** *** *** *** *

Also, implementation is still recursive and the worst-case recursion
depth is still O(N), where N is total number of recolored pixels, so
unlike many other solutions presented here, you didn't solve fir's
original problem.
And in presented form it's not thread-safe. Which is not a problem for
fir, but nonn-desirable for the rest of us.

Conclusion: sorry, you aren't going to get a cookie for your effort.

So, what is my own practical answer?
Assuming that speed is not a top priority and that simplicity
is pretty high on priority scale and that it should work with big
images and default stack size under Windows, I will go with following
not particularly fast and not particularly slow algorithm that I call
"deferred stack". That is, it's mostly explicit stack, but (explicit) recursion is deferred until all four neighbors of current pixel saved
on todo stack.
"Not particularly slow" means that I did see cases where some other
algorithms is 2 times faster, but had never seen 3x difference.
In case x and y are known to fit in uint16_t, UI type could be redefined accordingly. It will make execution faster, but not by much.

#include <stdlib.h>
#include <stddef.h>
#include <string.h>

typedef unsigned char Color;
typedef int UI;

int floodfill_r(
Color *image,
int width,
int height,
int x0,
int y0,
Color old,
Color new)
{
if (width < 0 || height < 0)
return 0;

if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
return 0;

size_t x = x0;
size_t y = y0;
if (image[y*width+x] != old)
return 0;

const ptrdiff_t INITIAL_TODO_SIZE = 128;
struct Point { UI x, y; } ;
struct Point *todo = malloc(INITIAL_TODO_SIZE * sizeof *todo );
if (!todo)
return -1;
struct Point* todo_end = &todo[INITIAL_TODO_SIZE];

todo[0].x = x; todo[0].y = y;
struct Point* sp = &todo[1];
do {
x = sp[-1].x; y = sp[-1].y;
--sp;
if (image[y*width+x] == old) {
image[y*width+x] = new;
if (x > 0 && image[y*width+x-1] == old) {
sp->x = x - 1; sp->y = y; ++sp;
}
if (y > 0 && image[y*width+x-width] == old) {
sp->x = x; sp->y = y - 1; ++sp;
}
if (x+1 < width && image[y*width+x+1] == old) {
sp->x = x + 1; sp->y = y; ++sp;
}
if (y+1 < height && image[y*width+x+width] == old) {
sp->x = x; sp->y = y + 1; ++sp;
}

if (todo_end-sp < 4) {
ptrdiff_t used = sp-todo;
ptrdiff_t size = todo_end - todo;
size += size/4;
struct Point* new_todo = realloc(todo, size * sizeof *todo );
if(!new_todo) {
free(todo);
return -1;
}
todo = new_todo;
sp = &todo[used];
todo_end = &todo[size];
}
}
} while (sp != todo);

free( todo );
return 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Sat Mar 23 00:21:00 2024

Malcolm McLean <[email protected]> writes:

On 17/03/2024 11:25, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

On 16/03/2024 13:55, Ben Bacarisse wrote:

Malcolm McLean <[email protected]> writes:

Recursion make programs harder to reason about and prove correct.

Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?

Example given. A recursive algorithm which is hard to reason about and
prove correct, because we don't really know whether under perfectly
reasonable assumptions it will or will not blow the stack.

Had you offered a proof that your code neither "blows the stack" nor
runs out of any other resource we'd have a starting point for
comparison, but you have not done that.
Mind you, had you done that, we would have something that might
eventually become only one piece of evidence for what is an
astonishingly general remark. Broadly applicable remarks require either
broadly applicable evidence or a wealth of distinct cases.
Your "rule" suggests that all reasoning is impeded by the presence of
recursion and I don't think you can support that claim. This is
characteristic of many of your remarks -- they are general "rules" that
often remain rules even when there is evidence to the contrary.
I'll make another point in the hope of clarifying the matter. An
algorithm or code is usually proved correct (or not!) under the
assumption that it has adequate resources -- usually time and storage.
Further reasoning may then be done to determine the resource
requirements since this is so often dependent on context. This
separation is helpful as you don't usually want to tie "correctness" to
some specific installation. The code might run on a system with a
dynamically allocated stack, for example, that has very similar
limitations to "heap" memory.
To put is more generally, we often want to prove properties of code that
are independent of physical constraints. Your remark includes this kind
reasoning. Did you intend it to?

The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited whilst stacks are not.

I put off answering for enough time that I now don't care anymore. I
think that's a win for everyone.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From fir@21:1/5 to Peter 'Shaggy' Haywood on Sat Mar 23 11:06:42 2024

Peter 'Shaggy' Haywood wrote:

Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
am. It's a cool scene! Dig it.

i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of this
recursion becouse maybe it is more fitted to be run parrallel

I wasn't going to post this here, since it's really an algorithm
issue, rather than a C issue. But the thread seems to have gone on for
some time with you seeming to be unable to solve this. So I'll give you
this as a clue.
The (or, at least, a) solution is only partially recursive. What I
have used is a line-based algorithm, each line being filled iteratively
(in a simple loop) from left to right. Recursion from line to line
completes the algorithm. Thus, the recursion level is greatly reduced.
And you should find that this approach fills an area of any shape.
Note, however, that for some pathological cases (very large and
complex shapes), this can still create a fairly large level of
recursion. Maybe a more complex approach can deal with this. What I
present here is just a very simple one which, in most cases, should
have a level of recursion well within reason.
I use a two part approach. The first part (called floodfill in the
code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
only called by floodfill(). It goes something like this (although this
is incomplete, untested and not code I've actually used, just an
example):

static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
{
unsigned start, end;

/* Find start and end of line within floodfill area. */
start = end = x;
while(old_clr == get_pixel(y, start - 1))
--start;
while(old_clr == get_pixel(y, end + 1))
++end;

/* Fill line with new colour. */
for(x = start; x <= end; x++)
set_pixel(y, x, new_clr);

/* Run along again, checking pixel colours above and below,
and recursing if appropriate. */
for(x = start; x <= end; x++)
{
if(old_clr == get_pixel(y - 1, x))
r_floodfill(y - 1, x, new_clr, old_clr);
if(old_clr == get_pixel(y + 1, x))
r_floodfill(y + 1, x, new_clr, old_clr);
}
}

void floodfill(unsigned y, unsigned x, pixel_t new_clr)
{
pixel_t old_clr = get_pixel(y, x);

/* Only proceed if colours differ. */
if(new_clr != old_clr)
r_floodfill(y, x, new_clr, old_clr);
}

To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).

well this is ok improvement for consideration - i hovever resolved a
problem even in 3 ways as you could note reading more carefully
1) put a stack to 100 MB and forget
2) ui wrote strightforward iteretive version (in draft) (this with
AddPixel(.. )
3) i noticed that the best method would be to introduce so called call
queue in c (probably best solution imo)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to All on Sat Mar 23 14:43:49 2024

Malcolm McLean <[email protected]> writes:

The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited whilst stacks are not.

That's not actually true. The size of both are bounded, yes.

It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Scott Lurndal on Sat Mar 23 11:48:16 2024

[email protected] (Scott Lurndal) writes:

Malcolm McLean <[email protected]> writes:

The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited while stacks are not.

That's not actually true. The size of both are bounded, yes.

It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.

The sizes of both heaps and stacks are bounded, because
pointers have a fixed number of bits. Certainly these
sizes can be very very large, but they are not unbounded.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sun Mar 24 19:33:52 2024

On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <[email protected]> wrote:

Tim Rentsch <[email protected]> writes:

[...]

Here is the refinement that uses a resizing rather than
fixed-size buffer.
[code]

This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.

Slower with some shapes, faster in others.

In my small test suit I found no cases where this specific code is measurably faster than code of Malcolm.

My test cases include pixel fields of 32k by 32k, with for
example filling the entire field starting at the center point.
Kind of a stress test but it turned up some interesting results.

I did find one case in which they are approximately equal. I call
it "slalom shape" and it's more or less designed to be the worst
case for algorithms that are trying to speed themselves by take
advantage of straight lines.
The slalom shape is generated by following code:
[code]

Thanks, I may try that.

In any case
the code was written for clarity of presentation, with
no attention paid to low-level performance.

Yes, your code is easy to understand. Could have been easier
still if persistent indices had more meaningful names.

I have a different view on that question. However I take your
point.

In other post I showed optimized variant of your algorithm: -
4-neighbors loop unrolled. Majority of the speed up come not from unrolling itself, but from specialization of in-rectangle check
enabled by unroll.
- Todo queue implemented as circular buffer.
- Initial size of queue increased.
This optimized variant is more competitive with 'line-grabby'
algorithms in filling solid shapes and faster than them in
'slalom' case.

Yes, unrolling is an obvious improvement. I deliberately chose a
simple (and non-optimized) method to make it easier to see how it
works. Simple optimizations are left as an exercise for the
reader. :)

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar limit
can be proven theoretically.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sun Mar 24 20:27:58 2024

On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:

The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.

One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.

To be fair, none of presented algorithms is abysmally slow.
When compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.
Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it happens
it's because of problematic implementation rather because of
differences in algorithms.
Even original naive recursive algorithm is not too slow when
implemented in optimized asm - 2.2x slower than the fastest for solid
square shape and closer than that for other shapes.

The big difference between algorithms is not a speed, but amount of
auxiliary memory used in the worst case. Your algorithm appears to be
the best in that department, Malcolm's algorithm it's also quite good
and all others (plain recursion, stacks, my deferred stack, all my
cross variants, lines-oriented recursion of : Peter 'Shaggy'
Haywood) are a lot worse.

But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than
difference between algorithms.

For example, solid 200x200 image with starting point in the corner
recolored by code presented in first Malcolm's post (not his own
algorithm, but recursive algorithm that he presented as a reference
point) on x86-64/gcc consumes 5,094,784 bytes of stack. After small modification (all non-changing parameters aggregated in structure
and passed by reference) the footprint falls to 2,547,328 B.
Coding the same algorithm (well, almost the same) in asm reduces it to
32,0000 B. Coding it with explicit stack cuts it to 40,000 B. Now I
didn't actually coded it, but I know how to compress explicit stack
down to 2 bits per level of recursion. If implemented, it would be
10,000B, i.e. comparable with much more economical algorithm of Malcolm
and 512x smaller than original implementation of [well, almost] the
same algorithm!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sun Mar 24 10:24:45 2024

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.

I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.

As for finding a worst case, try this (expressed in pseudo code):

let pc = { width/2, height/2 }
// assume pixel field 'field' starts out as all zeroes
color 8 "legs" with the value '1' as follows:

leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }

leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }

So the pixel field should look like this (with longer legs for a
bigger pixel field), with '-' being 0 and '*' being 1:

+-----------------------+
| - - - - - - - - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - * * * * - * * * * - |
| - - - - - - - - - - - |
| - * * * * - * * * * - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - - - - - - - - |
+-----------------------+

Now start coloring at the center point with a new value
of 2.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sun Mar 24 13:26:16 2024

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.

One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.

To be fair, none of presented algorithms is abysmally slow. When
compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.

Certainly "abysmally slow" is subjective, but working in a large
pixel field, filling the whole field starting at the center,
Malcolm's code runs slower than my unoptimized code by a factor of
10 (and a tad slower than that compared to my optimized code).

Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.

In the case of Malcolm's code I think it's the algorithm, because
it doesn't scale linearly. Malcolm's code runs faster than mine
for small colorings, but slows down dramatically as the image
being colored gets bigger.

The big difference between algorithms is not a speed, but amount of
auxiliary memory used in the worst case. Your algorithm appears to be
the best in that department, [...]

Yes, my unoptimized algorithm was designed to use as little
memory as possible. The optimized version traded space for
speed: it runs a little bit faster but incurs a non-trivial cost
in terms of space used. I think it's still not too bad, an upper
bound of a small multiple of N for an NxN pixel field.

But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than
difference between algorithms.

If I am not mistaken the original naive recursive algorithm has a
space cost that is O( N**2 ) for an NxN pixel field. The big-O
difference swamps everything else, just like the big-O difference
in runtime does for that metric.

For example, solid 200x200 image with starting point in the corner
[...]

On small pixel fields almost any algorithm is probably not too
bad. These days any serious algorithm should scale well up
to at least 4K by 4K, and tested up to at least 16K x 16K.
Tricks that make some things faster for small images sometimes
fall on their face when confronted with a larger image. My code
isn't likely to win many races on small images, but on large
images I expect it will always be competitive even if it doesn't
finish in first place.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Tim Rentsch on Sun Mar 24 20:48:52 2024

Tim Rentsch <[email protected]> writes:

[email protected] (Scott Lurndal) writes:

Malcolm McLean <[email protected]> writes:

The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited while stacks are not.

That's not actually true. The size of both are bounded, yes.

It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.

The sizes of both heaps and stacks are bounded, because
pointers have a fixed number of bits. Certainly these
sizes can be very very large, but they are not unbounded.

I was referring to the term of art used in POSIX, where
unlimited simply means that the operating system doesn't
limit them (and as I pointed out, physical limits, including
address space size (which is often only 48 bits, regardless
of the 64-bit pointer space)) will dominate.

$ ulimit -a
address space limit (Kibytes) (-M) unlimited
core file size (blocks) (-c) 0
cpu time (seconds) (-t) unlimited
data size (Kibytes) (-d) unlimited
file size (blocks) (-f) unlimited
locks (-x) unlimited

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Mon Mar 25 01:04:32 2024

On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.

I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.

As for finding a worst case, try this (expressed in pseudo code):

let pc = { width/2, height/2 }
// assume pixel field 'field' starts out as all zeroes
color 8 "legs" with the value '1' as follows:

leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }

leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }

So the pixel field should look like this (with longer legs for a
bigger pixel field), with '-' being 0 and '*' being 1:

+-----------------------+
| - - - - - - - - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - * * * * - * * * * - |
| - - - - - - - - - - - |
| - * * * * - * * * * - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - - - - - - - - |
+-----------------------+

Now start coloring at the center point with a new value
of 2.

Yes, I see. It is close to min(width,height)*4.
Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Mon Mar 25 01:28:44 2024

On Sun, 24 Mar 2024 13:26:16 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.

One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.

To be fair, none of presented algorithms is abysmally slow. When
compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.

Certainly "abysmally slow" is subjective, but working in a large
pixel field, filling the whole field starting at the center,
Malcolm's code runs slower than my unoptimized code by a factor of
10 (and a tad slower than that compared to my optimized code).

Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.

In the case of Malcolm's code I think it's the algorithm, because
it doesn't scale linearly. Malcolm's code runs faster than mine
for small colorings, but slows down dramatically as the image
being colored gets bigger.

The big difference between algorithms is not a speed, but amount of auxiliary memory used in the worst case. Your algorithm appears to
be the best in that department, [...]

Yes, my unoptimized algorithm was designed to use as little
memory as possible. The optimized version traded space for
speed: it runs a little bit faster but incurs a non-trivial cost
in terms of space used. I think it's still not too bad, an upper
bound of a small multiple of N for an NxN pixel field.

But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than
difference between algorithms.

If I am not mistaken the original naive recursive algorithm has a
space cost that is O( N**2 ) for an NxN pixel field. The big-O
difference swamps everything else, just like the big-O difference
in runtime does for that metric.

For example, solid 200x200 image with starting point in the corner
[...]

On small pixel fields almost any algorithm is probably not too
bad. These days any serious algorithm should scale well up
to at least 4K by 4K, and tested up to at least 16K x 16K.
Tricks that make some things faster for small images sometimes
fall on their face when confronted with a larger image. My code
isn't likely to win many races on small images, but on large
images I expect it will always be competitive even if it doesn't
finish in first place.

You are right. At 1920*1080 except for few special patterns, your
code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K.
Auxiliary memory arrays of Malcolm are still quite small at these image
sizes, but speed suffers.
I wonder if it is a problem of algorithm or of implementation. Since I
still didn't figure out his idea, I can't improve his implementation in
order check it.

One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite that it
is not slow. Probably the stack has very good locality of reference.

Here is the code:

#include <stdlib.h>
#include <stddef.h>

typedef unsigned char Color;

int floodfill4(
Color *image,
int width,
int height,
int x0,
int y0,
Color old,
Color new)
{
if (width <= 0 || height <= 0)
return 0;

if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
return 0;

const size_t w = width;
Color* image_end = &image[w*height];

size_t x = x0;
Color* row = &image[w*y0];
if (row[x] != old)
return 0;

const ptrdiff_t INITIAL_STACK_SZ = 256;
unsigned char* stack = malloc(INITIAL_STACK_SZ*sizeof(*stack));
if (!stack)
return -1;
unsigned char* sp = stack;
unsigned char* end_stack = &stack[INITIAL_STACK_SZ];

enum { ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN, ST_BEG };

recursive_call:
row[x] = new;
if (sp==end_stack) {
ptrdiff_t size = sp - stack;
ptrdiff_t new_size = size+size/2;
unsigned char* new_stack = realloc(stack, new_size *
sizeof(*stack)); if (!new_stack) {
free(stack);
return -1;
}
stack = new_stack;
sp = &stack[size];
end_stack = &stack[new_size];
}

for (unsigned state = ST_BEG;;) {
switch (state) {
case ST_BEG:

++x;
if (x != width) {
if (row[x] == old) {
*sp++ = ST_RIGHT; goto recursive_call; // recursive call
}
}
case ST_RIGHT:
--x;

if (x > 0) {
--x;
if (row[x] == old) {
*sp++ = ST_LEFT; goto recursive_call; // recursive call
}
case ST_LEFT:
++x;
}

if (row != image) {
row -= w;
if (row[x] == old) {
*sp++ = ST_UP; goto recursive_call; // recursive call
}
case ST_UP:
row += w;
}

row += w;
if (row != image_end) {
if (row[x] == old) {
*sp++ = ST_DOWN; goto recursive_call; // recursive call
}
case ST_DOWN:
}
row -= w;
break;
}

if (sp == stack)
break;

state = *--sp; // pop stack (back to caller)
}

free(stack);
return 1; // done
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Tue Mar 26 17:52:18 2024

On Mon, 25 Mar 2024 01:28:44 +0300
Michael S <[email protected]> wrote:

On Sun, 24 Mar 2024 13:26:16 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <[email protected]> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.

One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.

To be fair, none of presented algorithms is abysmally slow. When compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.

Certainly "abysmally slow" is subjective, but working in a large
pixel field, filling the whole field starting at the center,
Malcolm's code runs slower than my unoptimized code by a factor of
10 (and a tad slower than that compared to my optimized code).

Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.

In the case of Malcolm's code I think it's the algorithm, because
it doesn't scale linearly. Malcolm's code runs faster than mine
for small colorings, but slows down dramatically as the image
being colored gets bigger.

The big difference between algorithms is not a speed, but amount
of auxiliary memory used in the worst case. Your algorithm
appears to be the best in that department, [...]

Yes, my unoptimized algorithm was designed to use as little
memory as possible. The optimized version traded space for
speed: it runs a little bit faster but incurs a non-trivial cost
in terms of space used. I think it's still not too bad, an upper
bound of a small multiple of N for an NxN pixel field.

But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than difference between algorithms.

If I am not mistaken the original naive recursive algorithm has a
space cost that is O( N**2 ) for an NxN pixel field. The big-O
difference swamps everything else, just like the big-O difference
in runtime does for that metric.

For example, solid 200x200 image with starting point in the corner
[...]

On small pixel fields almost any algorithm is probably not too
bad. These days any serious algorithm should scale well up
to at least 4K by 4K, and tested up to at least 16K x 16K.
Tricks that make some things faster for small images sometimes
fall on their face when confronted with a larger image. My code
isn't likely to win many races on small images, but on large
images I expect it will always be competitive even if it doesn't
finish in first place.

You are right. At 1920*1080 except for few special patterns, your
code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K. Auxiliary memory arrays of Malcolm are still quite small at these
image sizes, but speed suffers.
I wonder if it is a problem of algorithm or of implementation. Since I
still didn't figure out his idea, I can't improve his implementation
in order check it.

One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite that
it is not slow. Probably the stack has very good locality of
reference.

Here is the code:
<snip>

The most robust code that I found so far that performs well both with
small pictures and with large and huge, is a variation on the same
theme of explicit stack, may be, more properly called trace back.
It operates on 2x2 squares instead of individual pixels.
The worst case auxiliary memory footprint of this variant is rather big,
up to picture_size/4 bytes. The code is *not* simple, but complexity
appears to be necessary for robust performance with various shapes and
sizes.

Todo queue based variants have very low memory footprint and perform
well for as long as recolored shape fits in the fast levels of
cache hierarchy, but suffer sharp slowdown when shape grows beyond
that. It seems, the problem of this algorithms is that the front
of recoloring is interleaved and focus of processing jumps randomly
across the front which leads to poor locality and to trashing of the
cache. May be, for huge pictures some sort of priority queue will
perform better than simple FIFO ? May be, implemented as binary heap? https://en.wikipedia.org/wiki/Binary_heap

Thought are interesting, but it's unlikely that it could lead to faster
code than one presented below.

#include <stdlib.h>
#include <stddef.h>

typedef unsigned char Color;

static __inline
unsigned check_column(Color *row, size_t x, size_t w, Color *end_image,
Color old)
{
unsigned b = row[x+0] == old ? 1<<0 : 0;
if (row+w != end_image && row[x+w] == old)
b |= 1 << 2;
return b;
}

static __inline
unsigned check_row(Color *row, size_t x, size_t w, Color old)
{
unsigned b = row[x+0] == old ? 1<<0 : 0;
if (x+1 != w && row[x+1] == old)
b |= 1 << 1;
return b;
}

int floodfill4(
Color *image,
int width,
int height,
int x0,
int y0,
Color old,
Color new)
{
if (width <= 0 || height <= 0)
return 0;

if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
return 0;

const size_t w = width;

size_t col0 = x0;
Color* row0 = &image[w * y0];
if (row0[col0] != old)
return 0;

int offs = 0;
if (y0 & 1) {
row0 -= w;
offs = 2;
}
if (col0 & 1) {
col0 -= 1;
offs |= 1;
}

Color* end_image = &image[w * height];

const ptrdiff_t INITIAL_STACK_SZ = 256;
unsigned char* stack = malloc(INITIAL_STACK_SZ*sizeof(*stack));
if (!stack)
return -1;
unsigned char* sp = stack;
unsigned char* end_stack = &stack[INITIAL_STACK_SZ];

enum {
// state
ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN,
ST_BEG,
STATE_BITS = 3,
// mask
MSK_B00 = 1 << 2, MSK_B01 = 1 << 3,
MSK_B10 = 1 << 4, MSK_B11 = 1 << 5,
MSK_B0x = MSK_B00 | MSK_B01,
MSK_B1x = MSK_B10 | MSK_B11,
MSK_Bx0 = MSK_B00 | MSK_B10,
MSK_Bx1 = MSK_B01 | MSK_B11,
MSK_Bxx = MSK_Bx0 | MSK_Bx1,
MSK_BITS = MSK_Bxx,
// from
FROM_LEFT = 0 << 6,
FROM_RIGHT = 1 << 6,
FROM_UP = 2 << 6,
FROM_DOWN = 3 << 6,
FROM_BITS = 3 << 6,
};

unsigned bit_mask0 = check_row(row0, col0, w, old)*MSK_B00;
if (row0+w != end_image)
bit_mask0 |= check_row(row0+w, col0, w, old)*MSK_B10;
static const unsigned char kill_diag_tab[4][2] = {
{MSK_B01 | MSK_B10, ~MSK_B11},
{MSK_B00 | MSK_B11, ~MSK_B10},
{MSK_B00 | MSK_B11, ~MSK_B01},
{MSK_B01 | MSK_B10, ~MSK_B00},
};
if ((bit_mask0 & kill_diag_tab[offs][0])==0)
bit_mask0 &= kill_diag_tab[offs][1];

for (int rep = 0; rep < 2; ++rep) {
unsigned bit_mask = bit_mask0;
Color* row = row0;
size_t x = col0;
unsigned from = rep == 0 ? FROM_DOWN : FROM_LEFT;

recursive_call:
if (bit_mask & MSK_B00) row[x+0] = new;
if (bit_mask & MSK_B01) row[x+1] = new;
if (bit_mask & MSK_B10) row[x+w+0] = new;
if (bit_mask & MSK_B11) row[x+w+1] = new;

if (sp==end_stack) {
ptrdiff_t size = sp - stack;
ptrdiff_t new_size = size+size/2;
unsigned char* new_stack = realloc(stack, new_size *
sizeof(*stack));
if (!new_stack) {
free(stack);
return -1;
}
stack = new_stack;
sp = &stack[size];
end_stack = &stack[new_size];
}

for (unsigned state = ST_BEG;;) {
switch (state) {
case ST_BEG:

if (from != FROM_RIGHT && bit_mask & MSK_Bx1) { // look right
x += 2;
if (x != w) {
unsigned bx0 = check_column(row, x, w, end_image, old);
if (bx0 & (bit_mask/MSK_B01)) {
// recursive call
*sp++ = bit_mask | from | ST_RIGHT;
bit_mask = bx0*MSK_B00;
x += 1;
if (x != w) {
unsigned bx1 = check_column(row, x, w, end_image, old);
if (bx0 & bx1)
bit_mask |= bx1*MSK_B01;
}
x -= 1;
from = FROM_LEFT;
goto recursive_call;
}
}
case ST_RIGHT:
x -= 2;
}

if (from != FROM_LEFT && bit_mask & MSK_Bx0) { // look left
if (x > 0) {
unsigned bx1 = check_column(row, x-1, w, end_image, old);
if (bx1 & (bit_mask/MSK_B00)) {
// recursive call
*sp++ = bit_mask | from | ST_LEFT;
bit_mask = bx1*MSK_B01;
x -= 2;
unsigned bx0 = check_column(row, x, w, end_image, old);
if (bx0 & bx1)
bit_mask |= bx0*MSK_B00;
from = FROM_RIGHT;
goto recursive_call;
case ST_LEFT:
x += 2;
}
}
}

if (from != FROM_UP && bit_mask & MSK_B0x) { // look up
if (row != image) {
row -= w;
unsigned b1x = check_row(row, x, w, old);
row -= w;
if (b1x & (bit_mask/MSK_B00)) {
// recursive call
*sp++ = bit_mask | from | ST_UP;
bit_mask = b1x*MSK_B10;
unsigned b0x = check_row(row, x, w, old);
if (b0x & b1x)
bit_mask |= b0x*MSK_B00;
from = FROM_DOWN;
goto recursive_call;
case ST_UP:
}
row += w*2;
}
}

if (from != FROM_DOWN && bit_mask & MSK_B1x) { // look down
row += w*2;
if (row != end_image) {
unsigned b0x = check_row(row, x, w, old);
if (b0x & (bit_mask/MSK_B10)) {
// recursive call
*sp++ = bit_mask | from | ST_DOWN;
bit_mask = b0x*MSK_B00;
row += w;
if (row != end_image) {
unsigned b1x = check_row(row, x, w, old);
if (b0x & b1x)
bit_mask |= b1x*MSK_B10;
}
row -= w;
from = FROM_UP;
goto recursive_call;
}
}
case ST_DOWN:
row -= w*2;
}
break;
}

if (sp == stack)
break;

unsigned stack_val = *--sp; // pop stack (back to caller)
state = stack_val & STATE_BITS;
bit_mask = stack_val & MSK_BITS;
from = stack_val & FROM_BITS;
}
}

free(stack);
return 1; // done
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Mar 28 23:04:36 2024

Michael S <[email protected]> writes:

[..various fill algorithms and how they scale..]

One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite that it
is not slow. Probably the stack has very good locality of reference.

[algorithm]

You are indeed a very clever fellow. I'm impressed.

Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.

I see you have also done a revised algorithm based on the same
idea, but more elaborate (to save on runtime footprint?).
Still working on formulating a response to that one...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Scott Lurndal on Thu Mar 28 22:51:21 2024

[email protected] (Scott Lurndal) writes:

Tim Rentsch <[email protected]> writes:

[email protected] (Scott Lurndal) writes:

Malcolm McLean <[email protected]> writes:

The convetional wisdom is the opposite, But here, conventional wisdom >>>>> fails. Because heaps are unlimited while stacks are not.

That's not actually true. The size of both are bounded, yes.

It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.

The sizes of both heaps and stacks are bounded, because
pointers have a fixed number of bits. Certainly these
sizes can be very very large, but they are not unbounded.

I was referring to the term of art used in POSIX, where
unlimited simply means that the operating system doesn't
limit them [.. elaboration ..]

The earlier sentence was confusing, as the sentence construction
suggested "unlimited" was a general term rather than one with a
specific meaning in POSIX. In any case thank you for the
education.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Fri Mar 29 15:21:41 2024

On Thu, 28 Mar 2024 23:04:36 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[..various fill algorithms and how they scale..]

One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of
original form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite
that it is not slow. Probably the stack has very good locality of reference.

[algorithm]

You are indeed a very clever fellow. I'm impressed.

Yes, the use of switch is clever :(
It more resemble computed GO TO in old FORTRAN or indirect jumps in asm
than idiomatic C switch. But it is a legal* C.

Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.

If non-trivially different, why not?

I see you have also done a revised algorithm based on the same
idea, but more elaborate (to save on runtime footprint?).
Still working on formulating a response to that one...

The original purpose of enhancement was to amortize non-trivial and
probably not very fast call stack emulation logic over more than one
pixel. 2x2 just happens to be the biggest block that still has very
simple in-block recoloring logic. ~4x reduction in the size of
auxiliary memory is just a pleasant side effect.

Exactly the same 4x reduction in memory size could have been achieved
with single-pixel variant by using packed array for 2-bit state
(==trace back) stack elements. But it would be the same or slower than
original while the enhanced variant is robustly faster than original.

After implementing the first enhancement I paid attention that at 4K
size the timing (per pixel) for few of my test cases is significantly
worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never looking back at immediate
parent of current block. The info about location of the parent nicely
fitted into remaining 2 bits of stack octet.

------
* - the versions I posted are not exactly legal C; they are illegal C non-rejected by gcc. But they can be trivially made into legal C by
adding semicolon after one of the case labels.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Fri Mar 29 23:58:26 2024

Michael S <[email protected]> writes:

On Thu, 28 Mar 2024 23:04:36 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[..various fill algorithms and how they scale..]

One thing that I were not expecting at this bigger pictures, is
good performance of simple recursive algorithm. Of course, not of
original form of it, but of variation with explicit stack. For
many shapes it has quite large memory footprint and despite that
it is not slow. Probably the stack has very good locality of
reference.

[algorithm]

You are indeed a very clever fellow. I'm impressed.

Yes, the use of switch is clever :(

In my view the cleverness is how "recursion" is accomplished by a
means of a combination of using a stack to store a "return address"
and restoring state by undoing a change rather than storing the
old value. Using a switch() is just a detail (and to my way of
thinking how the switch() is done here obscures the basic idea and
makes the code harder to understand, but never mind that).

It more resemble computed GO TO in old FORTRAN or indirect jumps
in asm than idiomatic C switch. But it is a legal* C.

I did program in FORTRAN briefly but don't remember ever using
computed GO TO. And yes, I found that missing semicolon and put it
back. Is there some reason you don't always use -pedantic? I
pretty much always do.

Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.

If non-trivially different, why not?

I hope to soon but am unable to right now (and maybe for a week
or so due to circumstances beyond my control). For sure the
code is different; whether it is non-trivially different I
leave for others to judge.

I see you have also done a revised algorithm based on the same
idea, but more elaborate (to save on runtime footprint?).
Still working on formulating a response to that one...

The original purpose of enhancement was to amortize non-trivial
and probably not very fast call stack emulation logic over more
than one pixel. 2x2 just happens to be the biggest block that
still has very simple in-block recoloring logic. ~4x reduction in
the size of auxiliary memory is just a pleasant side effect.

Exactly the same 4x reduction in memory size could have been
achieved with single-pixel variant by using packed array for 2-bit
state (==trace back) stack elements. But it would be the same or
slower than original while the enhanced variant is robustly faster
than original.

An alternate idea is to use a 64-bit integer for 32 "top of stack"
elements, or up to 32 I should say, and a stack with 64-bit values.
Just an idea, it may not turn out to be useful.

The few measurements I have done don't show a big difference in
performance between the two methods. But I admit I wasn't paying
close attention, and like I said only a few patterns of filling were
exercised.

After implementing the first enhancement I paid attention that at
4K size the timing (per pixel) for few of my test cases is
significantly worse than at smaller images. So, I added another
enhancement aiming to minimize cache trashing effects by never
looking back at immediate parent of current block. The info about
location of the parent nicely fitted into remaining 2 bits of
stack octet.

The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't). Seems like a good
idea.

Two further comments.

One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?

Two, and more important, the new algorithm still uses O(NxN) memory
for an N by N pixel field. We really would like to get that down to
O(N) memory (and of course run just as fast). Have you looked into
how that might be done?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sat Mar 30 00:54:19 2024

Michael S <[email protected]> writes:

[...]

The most robust code that I found so far that performs well both
with small pictures and with large and huge, is a variation on the
same theme of explicit stack, may be, more properly called trace
back. It operates on 2x2 squares instead of individual pixels.

The worst case auxiliary memory footprint of this variant is rather
big, up to picture_size/4 bytes. The code is *not* simple, but
complexity appears to be necessary for robust performance with
various shapes and sizes.

[...]

I took a cursory look just now, after reading your other later
posting. I think I have a general sense, especially in conjunction
with the explanatory comments.

I'm still hoping to find a method that is both fast and has
good memory use, which is to say O(N) for an NxN pixel field.

Something that would help is to have a library of test cases,
by which I mean patterns to be colored, so that a set of
methods could be tried, and timed, over all the patterns in
the library. Do you have something like that? So far all
my testing has been ad hoc.

Incidentally, it looks like your code assumes X varies more rapidly
than Y, so a "by row" order, whereas my code assumes Y varies more
rapidly than X, a "by column" order. The difference doesn't matter
as long as the pixel field is square and the test cases either are
symmetric about the X == Y axis or duplicate a non-symmetric pattern
about the X == Y axis. I would like to be able to run comparisons
between different methods and get usable results without having
to jump around because of different orientations. I'm not sure
how to accommodate that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sat Mar 30 21:15:06 2024

On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <[email protected]> wrote:

I did program in FORTRAN briefly but don't remember ever using
computed GO TO. And yes, I found that missing semicolon and put it
back. Is there some reason you don't always use -pedantic? I
pretty much always do.

Just a habit.
In "real" work, as opposed to hobby, I use gcc almost exclusively for
small embedded targets and quite often with 3-rd party libraries in
source form. In such environment rising warnings level above -Wall
would be counterproductive, because it would be hard to see relevant
warning behind walls of false alarms.
May be, for hobby, where I have full control on everything, switching
to -Wpedantic is not a bad idea.

An alternate idea is to use a 64-bit integer for 32 "top of stack"
elements, or up to 32 I should say, and a stack with 64-bit values.
Just an idea, it may not turn out to be useful.

That's just a detail of how to do pack/unpack with minimal
overhead. It does not change the principle that 'packed' version would
be less memory hungry but on modern PC with GBs of RAM it will not be
faster than original.
Memory footprint can directly affect speed when access patterns have
poor locality or when the rate of access exceeds 10-20 GB/s. In our
case locality of stack access is very good and the rate of stack
access, even on ultra fast processor, is less than 1 GB/s.

The few measurements I have done don't show a big difference in
performance between the two methods. But I admit I wasn't paying
close attention, and like I said only a few patterns of filling were exercised.

After implementing the first enhancement I paid attention that at
4K size the timing (per pixel) for few of my test cases is
significantly worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never
looking back at immediate parent of current block. The info about
location of the parent nicely fitted into remaining 2 bits of
stack octet.

The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't). Seems like a good
idea.

I call it a principle of Lot's wife.
That is yet another reason to not grow blocks above 2x2.
For bigger blocks it does not apply.

Two further comments.

One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?

On my 11 y.o. and not top-of-the-line even then home PC for 4K
image (3840 x 2160) with cross-in-cross shape that I took from one of
your previous post, it is 2.43 times faster.
I don't remember how it compares on more modern systems. Anyway, right
now I have no test systems more modern than 3 y.o. Zen3.

Two, and more important, the new algorithm still uses O(NxN) memory
for an N by N pixel field. We really would like to get that down to
O(N) memory (and of course run just as fast). Have you looked into
how that might be done?

Using this particular principle of not saving (x,y) in auxiliary
storage, I don't believe that it is possible to have a footprint
smaller than O(W*H).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sat Mar 30 11:59:03 2024

Michael S <[email protected]> writes:

On Thu, 28 Mar 2024 23:04:36 -0700
Tim Rentsch <[email protected]> wrote:

Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.

If non-trivially different, why not?

Here is the code:

void
stack_fill( UI w0, UI h0, UC pixels[], Point p0, UC old, UC new ){
U64 w = ( assert( w0 > 0 ), w0 );
U64 h = ( assert( h0 > 0 ), h0 );
U64 px = ( assert( p0.x < w ), p0.x );
U64 py = ( assert( p0.y < h ), p0.y );

UC *x0 = ( assert( pixels ), pixels );
UC *x = x0 + px*h;
UC *xm = x0 + h*w - h;

U64 y0 = 0;
U64 y = py;
U64 ym = h-1;

UC *s0 = malloc( sizeof *s0 );
UC *s = s0;
UC *sn = s0 ? s0+1 : s0;

if( s0 && x[y] == old ) do {
x[y] = new;
if( s == sn ){
U64 s_offset = s - s0;
U64 n = (sn-s0+1) *3 /2;
UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

if( ! new_s0 ) break;
s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
}

if( y < ym && x[y+1] == old ){
y += 1, *s++ = 2; continue; UNDO_UP:
y -= 1;
}
if( y > y0 && x[y-1] == old ){
y -= 1, *s++ = 3; continue; UNDO_DOWN:
y += 1;
}
if( x < xm && y[x+h] == old ){
x += h, *s++ = 0; continue; UNDO_LEFT:
x -= h;
}
if( x > x0 && y[x-h] == old ){
x -= h, *s++ = 1; continue; UNDO_RIGHT:
x += h;
}

if( s == s0 ) break;

switch( *--s & 3 ){
case 0: goto UNDO_LEFT;
case 1: goto UNDO_RIGHT;
case 2: goto UNDO_UP;
case 3: goto UNDO_DOWN;
}
} while( 1 );

free( s0 );
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sat Mar 30 21:26:57 2024

On Sat, 30 Mar 2024 00:54:19 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

The most robust code that I found so far that performs well both
with small pictures and with large and huge, is a variation on the
same theme of explicit stack, may be, more properly called trace
back. It operates on 2x2 squares instead of individual pixels.

The worst case auxiliary memory footprint of this variant is rather
big, up to picture_size/4 bytes. The code is *not* simple, but
complexity appears to be necessary for robust performance with
various shapes and sizes.

[...]

I took a cursory look just now, after reading your other later
posting. I think I have a general sense, especially in conjunction
with the explanatory comments.

I'm still hoping to find a method that is both fast and has
good memory use, which is to say O(N) for an NxN pixel field.

Something that would help is to have a library of test cases,
by which I mean patterns to be colored, so that a set of
methods could be tried, and timed, over all the patterns in
the library. Do you have something like that? So far all
my testing has been ad hoc.

I am not 100% sure about the meaning of 'ad hoc', but I'd guess that
mine are ad hoc too. Below are shapes that I use apart from solid
rectangles. I run them at 5 sizes: 25x19, 200x200, 1280x720, 1920x1080, 3840x2160. That is certainly not enough for correction tests, but feel
that it is sufficient for speed tests.

static void make_standing_snake(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c)
{
for (int y = 0; y < height; ++y) {
unsigned char* p = &image[y*width];
if (y % 2 == 0) {
memset(p, pen_c, width);
} else {
memset(p, background_c, width);
if (y % 4 == 1)
p[width-1] = pen_c;
else
p[0] = pen_c;
}
}
}

static void make_prostrate_snake(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c)
{
memset(image, background_c, sizeof(*image)*width*height);
// vertical bars
for (int y = 0; y < height; ++y)
for (int x = 0; x < width; x += 2)
image[y*width+x] = pen_c;

// connect bars at top
for (int x = 3; x < width; x += 4)
image[x] = pen_c;

// connect bars at bottom
for (int x = 1; x < width; x += 4)
image[(height-1)*width+x] = pen_c;
}

static void make_slalom(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c)
{
const int n_col = width/3;
const int n_row = (height-3)/4;

// top row
// P B B P P P
for (int col = 0; col < n_col; ++col) {
unsigned char c = (col & 1)==0 ? background_c : pen_c;
image[col*3] = pen_c; image[col*3+1] = c; image[col*3+2] = c;
}
for (int x = n_col*3; x < width; ++x)
image[x] = image[n_col*3-1];

// main image: consists of 3x4 blocks filled by following pattern
// P B B
// P P B
// B P B
// P P B
for (int row = 0; row < n_row; ++row) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[(row*4+1)*width+col*3];
p[0] = pen_c; p[1] = background_c; p[2] = background_c; p
+= width; p[0] = pen_c; p[1] = pen_c; p[2] =
background_c; p += width; p[0] = background_c; p[1] = pen_c;
p[2] = background_c; p += width; p[0] = pen_c; p[1] = pen_c;
p[2] = background_c; p += width; }
}

// near-bottom rows
// P B B
for (int y = n_row*4+1; y < height-1; ++y) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[y*width+col*3];
p[0] = pen_c; p[1] = background_c; p[2] = background_c;
}
}

// bottom row - all P
// P P P P B B
unsigned char *b_row = &image[width*(height-1)];
for (int col = 0; col < n_col; ++col) {
unsigned char c = (col & 1)==1 ? background_c : pen_c;
b_row[col*3+0] = pen_c;
b_row[col*3+1] = c;
b_row[col*3+2] = c;
}
for (int x = n_col*3; x < width; ++x)
b_row[x] = b_row[n_col*3-1];

// rightmost columns
for (int x = n_col*3; x < width; ++x) {
for (int y = 1; y < height-1; ++y)
image[y*width+x] = background_c;
}
}

static void make_slalom90(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c)
{
const int n_col = (width-3)/4;
const int n_row = height/3;

// leftmost column
// P
// B
// B
// P
// P
// P
for (int row = 0; row < n_row; ++row) {
unsigned char c = (row & 1)==0 ? background_c : pen_c;
image[(row*3+0)*width] = pen_c;
image[(row*3+1)*width] = c;
image[(row*3+2)*width] = c;
}
for (int y = n_row*3; y < height; ++y)
image[y*width] = image[(n_row*3-1)*width];

// main image: consists of 4x3 blocks filled by following pattern
// P P B P
// B P P P
// B B B B
for (int row = 0; row < n_row; ++row) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[(row*3*width)+(col*4+1)];
p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
p[3] = pen_c; p += width; p[0] = background_c; p[1] = pen_c;
p[2] = pen_c; p[3] = pen_c; p += width; p[0] = background_c;
p[1] = background_c; p[2] = background_c; p[3] = background_c; }
}

// near-rightmost column
// P
// B
// B
for (int row = 0; row < n_row; ++row) {
for (int x = n_col*4+1; x < width-1; ++x) {
unsigned char* p = &image[row*width*3+x];
p[0*width] = pen_c;
p[1*width] = background_c;
p[2*width] = background_c;
}
}

// rightmost column
// P
// P
// P
// P
// B
// B
unsigned char *r_col = &image[width-1];
for (int row = 0; row < n_row; ++row) {
unsigned char c = (row & 1)==1 ? background_c : pen_c;
r_col[(row*3+0)*width] = pen_c;
r_col[(row*3+1)*width] = c;
r_col[(row*3+2)*width] = c;
}
for (int y = n_row*3; y < height; ++y)
r_col[y*width] = r_col[(n_row*3-1)*width];

// bottom rows
for (int y = n_row*3; y < height; ++y) {
for (int x = 1; x < width-1; ++x)
image[y*width+x] = background_c;
}
}

static void make_crosss_in_cross(
unsigned char* image,
int width,
int height,
int xc,
int yc,
unsigned char background_c,
unsigned char pen_c)
{
memset(image, pen_c, width*height);

if (xc > 1 && xc+1 < width-1 && yc > 1 && yc+1 < height-1) {
memset(&image[(yc-1)*width+1], background_c, xc-1);
memset(&image[(yc+1)*width+1], background_c, xc-1);
memset(&image[(yc-1)*width+xc+1], background_c, width-xc-2);
memset(&image[(yc+1)*width+xc+1], background_c, width-xc-2);
for (int y = 1; y < yc; ++y) {
image[y*width+xc-1] = background_c;
image[y*width+xc+1] = background_c;
}
for (int y = yc+1; y < height-1; ++y) {
image[y*width+xc-1] = background_c;
image[y*width+xc+1] = background_c;
}
}
}

Incidentally, it looks like your code assumes X varies more rapidly
than Y, so a "by row" order, whereas my code assumes Y varies more
rapidly than X, a "by column" order.

It is not so much about what I assume as about what is cheaper for
CPU hardware.

The difference doesn't matter
as long as the pixel field is square and the test cases either are
symmetric about the X == Y axis or duplicate a non-symmetric pattern
about the X == Y axis. I would like to be able to run comparisons
between different methods and get usable results without having
to jump around because of different orientations. I'm not sure
how to accommodate that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Sun Mar 31 10:54:38 2024

On Sat, 30 Mar 2024 21:15:06 +0300
Michael S <[email protected]> wrote:

On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <[email protected]> wrote:

One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?

On my 11 y.o. and not top-of-the-line even then home PC for 4K
image (3840 x 2160) with cross-in-cross shape that I took from one of
your previous post, it is 2.43 times faster.
I don't remember how it compares on more modern systems. Anyway, right
now I have no test systems more modern than 3 y.o. Zen3.

I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
Zen3 (EPYC 7543P).
Here I no longer see significant drop in speed of the 1x1 variant at 4K
size, but I still see that more complicated variant provides nice speed
up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Fri Apr 5 17:30:33 2024

On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.

I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.

It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
1920x1080.

static void make_fractal_tree_recursive(
unsigned char* image,
int width,
int nx,
int ny,
unsigned char pen_c)
{
if (nx < 3 && ny < 3) {
// small rectangle - solid fill
for (int y = 0; y < ny; ++y)
for (int x = 0; x < nx; ++x)
image[width*y+x] = pen_c;
return;
}
if (nx >= ny) {
int xc = (nx-1)/2;
if (xc - 1 > 0) { // left sub-plot
make_fractal_tree_recursive(image, width, xc - 1, ny, pen_c);
}
if (xc + 2 < nx) { // right sub-plot
make_fractal_tree_recursive(&image[xc+2], width,
nx - (xc + 2), ny, pen_c);
}
// draw vertical cross
for (int y = 0; y < ny; ++y)
image[width*y+xc] = pen_c;
int yc = (ny-1)/2;
image[width*yc+xc-1] = pen_c;
image[width*yc+xc+1] = pen_c;
} else {
int yc = (ny-1)/2;
if (yc - 1 > 0) { // upper sub-plot
make_fractal_tree_recursive(image, width, nx, yc - 1, pen_c);
}
if (yc + 2 < ny) { // lower sub-plot
make_fractal_tree_recursive(&image[(yc+2)*width], width, nx,
ny -(yc + 2), pen_c);
}
// draw horizontal cross
for (int x = 0; x < nx; ++x)
image[width*yc+x] = pen_c;
int xc = (nx-1)/2;
image[width*(yc-1)+xc] = pen_c;
image[width*(yc+1)+xc] = pen_c;
}
}

static void make_fractal_tree(
unsigned char* image,
int width,
int height,
unsigned char background_c,
unsigned char pen_c)
{
memset(image, background_c, width*height);
make_fractal_tree_recursive(image, width, width, height, pen_c);
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Tue Apr 9 01:00:34 2024

Michael S <[email protected]> writes:

On Sat, 30 Mar 2024 00:54:19 -0700
Tim Rentsch <[email protected]> wrote:

[...]

Something that would help is to have a library of test cases,
by which I mean patterns to be colored, so that a set of
methods could be tried, and timed, over all the patterns in
the library. Do you have something like that? So far all
my testing has been ad hoc.

I am not 100% sure about the meaning of 'ad hoc', but I'd guess
that mine are ad hoc too. Below are shapes that I use apart from
solid rectangles. I run them at 5 sizes: 25x19, 200x200,
1280x720, 1920x1080, 3840x2160. That is certainly not enough for
correction tests, but feel that it is sufficient for speed tests.

[code]

I got these, thank you.

Here is a pattern generating function I wrote for my own
testing. Disclaimer: slightly changed from my original
source, hopefully any errors inadvertently introduced can
be corrected easily. Also, it uses the value 0 for the
background and the value 1 for the pattern to be colored.

#include <math.h>
#include <stddef.h>
#include <string.h>

typedef unsigned char Pixel;

extern void
ellipse_with_hole( Pixel *field, unsigned w, unsigned h ){
size_t i, j;
double wc = w/2, hc = h/2;

double a = (w > h ? wc : hc) -1;
double b = (w > h ? hc : wc) -1;

double b3 = 1+6*b/8;
double radius = b/2.5;
double cx = w > h ? wc : b3+1;
double cy = w > h ? b3+1 : hc;

double focus = sqrt( a*a - b*b );
double f1x = w > h ? wc - focus : wc;
double f1y = w > h ? hc : hc - focus;
double f2x = w > h ? wc + focus : wc;
double f2y = w > h ? hc : hc + focus;

memset( field, 0, w*h );

for( i = 0; i < w; i++ ){
for( j = 0; j < h; j++ ){
double dx = i - cx, dy = j - cy;
double r2 = radius * radius;
if( dx * dx + dy*dy <= r2 ) continue;
double dx1 = i - f1x, dy1 = j - f1y;
double dx2 = i - f2x, dy2 = j - f2y;
double sum2 = a*2;
double d1 = sqrt( dx1*dx1 + dy1*dy1 );
double d2 = sqrt( dx2*dx2 + dy2*dy2 );
if( d1 + d2 > 2*a ) continue;
field[ i+j*w ] = 1;
}}
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Tue Apr 9 01:55:39 2024

Michael S <[email protected]> writes:

On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <[email protected]> wrote:

I did program in FORTRAN briefly but don't remember ever using
computed GO TO. And yes, I found that missing semicolon and put it
back. Is there some reason you don't always use -pedantic? I
pretty much always do.

Just a habit.
In "real" work, as opposed to hobby, I use gcc almost exclusively for
small embedded targets and quite often with 3-rd party libraries in
source form. In such environment rising warnings level above -Wall
would be counterproductive, because it would be hard to see relevant
warning behind walls of false alarms.
May be, for hobby, where I have full control on everything, switching
to -Wpedantic is not a bad idea.

My experience with third party libraries is that sometimes they use
extensions, probably mostly gcc-isms. Not much to be done in that
case. Of course turning on -pedantic could be done selectively.

It might be worth an experiment of turning off -Wall while turning
on -pedantic to see how big or how little the problem is.

The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't). Seems like a good
idea.

I call it a principle of Lot's wife.
That is yet another reason to not grow blocks above 2x2.
For bigger blocks it does not apply.

Here is an updated version of my "stacking" code. On my test
system (and I don't even know exactly what CPU it has, probably
about 5 years old) this code runs about 30% faster than your 2x2
version, averaged across all patterns and all sizes above the
smallest ones (25x19 and 19x25).

#include <assert.h>
#include <stdlib.h>

typedef unsigned char UC, Color;
typedef size_t Index, Count;
typedef struct { Index x, y; } Point;

extern Count
stack_plus( UC *field, Index w, Index h, Point p0, Color old, Color new ){
Index px = ( assert( p0.x < w ), p0.x );
Index py = ( assert( p0.y < h ), p0.y );

Index x0 = 0;
Index x = px;
Index xm = w-1;

UC *y0 = field;
UC *y = y0 + py*w;
UC *ym = y0 + h*w - w;

UC *s0 = malloc( 8 * sizeof *s0 );
UC *s = s0;
UC *sn = s0 ? s0+8 : s0;

Count r = 0;

if( s0 ) goto START_FOUR;

while( s != s0 ){
switch( *--s & 15 ){
case 0: goto UNDO_START_LEFT;
case 1: goto UNDO_START_RIGHT;
case 2: goto UNDO_START_UP;
case 3: goto UNDO_START_DOWN;

case 4: goto UNDO_LEFT_DOWN;
case 5: goto UNDO_LEFT_LEFT;
case 6: goto UNDO_LEFT_UP;

case 7: goto UNDO_UP_LEFT;
case 8: goto UNDO_UP_UP;
case 9: goto UNDO_UP_RIGHT;

case 10: goto UNDO_RIGHT_UP;
case 11: goto UNDO_RIGHT_RIGHT;
case 12: goto UNDO_RIGHT_DOWN;

case 13: goto UNDO_DOWN_RIGHT;
case 14: goto UNDO_DOWN_DOWN;
case 15: goto UNDO_DOWN_LEFT;
}

START_FOUR:
if( y[x] != old ) continue;
y[x] = new; r++;
if( x < xm && y[x+1] == old ){
x += 1, *s++ = 0; goto START_LEFT; UNDO_START_LEFT:
x -= 1;
}
if( x > x0 && y[x-1] == old ){
x -= 1, *s++ = 1; goto START_RIGHT; UNDO_START_RIGHT:
x += 1;
}
if( y < ym && x[y+w] == old ){
y += w, *s++ = 2; goto START_UP; UNDO_START_UP:
y -= w;
}
if( y > y0 && x[y-w] == old ){
y -= w, *s++ = 3; goto START_DOWN; UNDO_START_DOWN:
y += w;
}
continue;

START_LEFT:
y[x] = new; r++;
if( s == sn ){
Index s_offset = s - s0;
Index n = (sn-s0+1) *3 /2;
UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

if( ! new_s0 ) break;
s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
}
if( x < xm && y[x+1] == old ){
x += 1, *s++ = 5; goto START_LEFT; UNDO_LEFT_LEFT:
x -= 1;
}
if( y > y0 && x[y-w] == old ){
y -= w, *s++ = 4; goto START_DOWN; UNDO_LEFT_DOWN:
y += w;
}
if( y < ym && x[y+w] == old ){
y += w, *s++ = 6; goto START_UP; UNDO_LEFT_UP:
y -= w;
}
continue;

START_UP:
y[x] = new; r++;
if( s == sn ){
Index s_offset = s - s0;
Index n = (sn-s0+1) *3 /2;
UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

if( ! new_s0 ) break;
s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
}
if( x < xm && y[x+1] == old ){
x += 1, *s++ = 7; goto START_LEFT; UNDO_UP_LEFT:
x -= 1;
}
if( x > x0 && y[x-1] == old ){
x -= 1, *s++ = 9; goto START_RIGHT; UNDO_UP_RIGHT:
x += 1;
}
if( y < ym && x[y+w] == old ){
y += w, *s++ = 8; goto START_UP; UNDO_UP_UP:
y -= w;
}
continue;

START_RIGHT:
y[x] = new; r++;
if( s == sn ){
Index s_offset = s - s0;
Index n = (sn-s0+1) *3 /2;
UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

if( ! new_s0 ) break;
s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
}
if( x > x0 && y[x-1] == old ){
x -= 1, *s++ = 11; goto START_RIGHT; UNDO_RIGHT_RIGHT:
x += 1;
}
if( y < ym && x[y+w] == old ){
y += w, *s++ = 10; goto START_UP; UNDO_RIGHT_UP:
y -= w;
}
if( y > y0 && x[y-w] == old ){
y -= w, *s++ = 12; goto START_DOWN; UNDO_RIGHT_DOWN:
y += w;
}
continue;

START_DOWN:
y[x] = new; r++;
if( s == sn ){
Index s_offset = s - s0;
Index n = (sn-s0+1) *3 /2;
UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

if( ! new_s0 ) break;
s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
}
if( x > x0 && y[x-1] == old ){
x -= 1, *s++ = 13; goto START_RIGHT; UNDO_DOWN_RIGHT:
x += 1;
}
if( x < xm && y[x+1] == old ){
x += 1, *s++ = 15; goto START_LEFT; UNDO_DOWN_LEFT:
x -= 1;
}
if( y > y0 && x[y-w] == old ){
y -= w, *s++ = 14; goto START_DOWN; UNDO_DOWN_DOWN:
y += w;
}
continue;

}

return free( s0 ), r;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Tue Apr 9 02:32:31 2024

Michael S <[email protected]> writes:

On Sat, 30 Mar 2024 21:15:06 +0300
Michael S <[email protected]> wrote:

On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <[email protected]> wrote:

One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?

On my 11 y.o. and not top-of-the-line even then home PC for 4K
image (3840 x 2160) with cross-in-cross shape that I took from one of
your previous post, it is 2.43 times faster.
I don't remember how it compares on more modern systems. Anyway, right
now I have no test systems more modern than 3 y.o. Zen3.

I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
Zen3 (EPYC 7543P).
Here I no longer see significant drop in speed of the 1x1 variant at 4K
size, but I still see that more complicated variant provides nice speed
up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.

On my test system the numbers are closer and also more evenly
balanced: ratios range from about 0.70 to about 1.40, roughly
evenly split with the 2x2 version somewhat better. There was
one outlier at approximately 1.48. More precisely, the ratios
have an average of 1.06 (which means the 1x1 version is about
6 percent slower on average), with a standard deviation of 0.21.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed Apr 10 19:47:11 2024

Michael S <[email protected]> writes:

On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

For my test cases the FIFO depth of your algorithm never exceeds
min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.

I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.

Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read

Index const jx = used*3 < open ? k : j+open/3 &m;

rather than what it used to. (The type may have changed but that
is incidental; what matters is the value of the initializing
expression.) I don't know what I was thinking when I wrote the
previous version, it's just completely wrong.

It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
1920x1080.

I agree, the empirical evidence here and in my own tests is quite
compelling.

That said, the constant factor for the FIFO algorithm is lower
than the stack-based algorithms, even taking into account the
difference in sizes for queue and stack elements. Moreover cases
where FIFO algorithms are O( NxN ) are unusual and sparse,
whereas the stack-based algorithms tend to use a lot of memory
in lots of common and routine cases. On the average FIFO
algorithms typically use a lot less memory (or so I conjecture).

[code to generate fractal tree pattern]

Thank you for this. I incorporated it into my set of test
patterns more or less as soon as it was posted.

Now that I have taken some time to play around with different
algorithms and have been more systematic in doing speed
comparisons between different algorithms, on different patterns,
and with a good range of sizes, I have some general thoughts
to offer.

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

I've been playing around with a more elaborate, mostly FIFO
method, in hopes of getting something that offers the best
of both worlds. The results so far are encouraging, but a
fair amount of tuning has been necessary (and perhaps more
still is), and comparisons have been done on just the one
test server I have available. So I don't know how well it
would hold up on other hardware, including especially more
recent hardware. Under these circumstances I feel it is
premature to post actual code, especially since the code
is still in flux.

This topic has been more interesting that I was expecting, and
also more challenging. I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Thu Apr 11 15:20:33 2024

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Generally, I like your algorithm.
It was surprising for me that queue can work better than stack,
my intuition suggested otherwise, but facts are facts.

Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).

For my test cases the FIFO depth of your algorithm never exceeds
min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.

I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.

Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read

Index const jx = used*3 < open ? k : j+open/3 &m;

I lost track, sorry. I can not find your code that contains line
similar to this.
Can you point to specific post?

rather than what it used to. (The type may have changed but that
is incidental; what matters is the value of the initializing
expression.) I don't know what I was thinking when I wrote the
previous version, it's just completely wrong.

It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured
memory consumption for 3840x2160 image almost exactly 4x as much as
for 1920x1080.

I agree, the empirical evidence here and in my own tests is quite
compelling.

BTW, I am no longer agree with myself about "the rest of them".
By now, I know at least one method that is O(W*log(H)). It is even
quite fast for majority of my test shapes. Unfortunately, [in its
current form] it is abysmally slow (100x) for minority of tests.
[In it's current form] it has other disadvantages as well like
consuming non-trivial amount of memory when handling small spot in the
big image. But that can be improved. I am less sure that worst-case
speed can be improved enough to make it generally acceptable.

I think, I said enough for you to figure out a general principle of
this algorithm. I don't want to post code here before I try few
improvements.

That said, the constant factor for the FIFO algorithm is lower
than the stack-based algorithms, even taking into account the
difference in sizes for queue and stack elements. Moreover cases
where FIFO algorithms are O( NxN ) are unusual and sparse,
whereas the stack-based algorithms tend to use a lot of memory
in lots of common and routine cases. On the average FIFO
algorithms typically use a lot less memory (or so I conjecture).

[code to generate fractal tree pattern]

Thank you for this. I incorporated it into my set of test
patterns more or less as soon as it was posted.

Now that I have taken some time to play around with different
algorithms and have been more systematic in doing speed
comparisons between different algorithms, on different patterns,
and with a good range of sizes, I have some general thoughts
to offer.

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

Indeed, with solid shapes it uses more memory. But at least in my tests
on my hardware with this sort of shapes it is easily faster than
anything else. The difference vs the best of the rest is especially big
at 4K images on AMD Zen3 based hardware, but even on Intel Skylake which generally serves as equalizer between different algorithms, the speed
advantage of 2x2 stack is significant.

I've been playing around with a more elaborate, mostly FIFO
method, in hopes of getting something that offers the best
of both worlds. The results so far are encouraging, but a
fair amount of tuning has been necessary (and perhaps more
still is), and comparisons have been done on just the one
test server I have available. So I don't know how well it
would hold up on other hardware, including especially more
recent hardware. Under these circumstances I feel it is
premature to post actual code, especially since the code
is still in flux.

This topic has been more interesting that I was expecting, and
also more challenging.

That's not the first time in my practice where problems with simple
formulation begots interesting challenges.
Didn't Donald Knuth wrote 300 or 400 pages about sorting and still
ended up quite far away from exhausting the topic?

I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.

So why not break it down to smaller pieces ?
Myself, I have no rules. In my real work I am quite happy with
dispatchers of network messages that are 250-300 lines long. But if I
had this sort of rules, I'd certainly decompose.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 21:06:38 2024

Michael S <[email protected]> writes:

(I'm replying in pieces.)

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read

Index const jx = used*3 < open ? k : j+open/3 &m;

I lost track, sorry. I can not find your code that contains line
similar to this.
Can you point to specific post?

Easier for me just to repost the corrected algorithm. The
type UC is an unsigned char, the types Index and Count are
size_t (or maybe unsigned long long), the type U32 is a
32-bit unsigned type.

Please excuse any minor glitches, I have done some hand
editing to take out various bits of diagnostic code.

extern Count
fifo_fill( UC *field, Index w, Index h, Point p0, UC old, UC new ){
Index const xm = w-1;
Index const ym = h-1;

Index j = 0;
Index k = 0;
Index n = 1u << 10;
Index m = n-1;
U32 *todo = malloc( n * sizeof *todo );
Index x = p0.x;
Index y = p0.y;

if( !todo || x >= w || y >= h || field[ x+y*w ] != old ) return 0;

todo[ k++ ] = x<<16 | y;

while( j != k ){
Index used = j < k ? k-j : k+n-j;
Index open = n - used;
if( open < used/16 ){
Index new_n = n*2;
Index new_m = new_n-1;
Index new_j = j < k ? j : j+n;
U32 *t = realloc( todo, new_n * sizeof *t );
if( ! t ) break;
if( j != new_j ) memcpy( t+new_j, t+j, (n-j) * sizeof *t );
todo = t, n = new_n, m = new_m, j = new_j, open = n-used;
}
assert( (k-j&m) == used && open+used == n );

Index const jx = used*3 < open ? k : j+open/3 &m; // here it is!
while( j != jx ){
if( (k-j&m) > mm ) mm = k-j&m;
U32 p = todo[j]; j = j+1 &m;
x = p >> 16, y = p & 0xFFFF;
if( x > 0 && field[ x-1 + y*w ] == old ){
todo[k] = x-1<<16 | y, k = k+1&m, field[ x-1 + y*w ] = new;
}
if( y > 0 && field[ x + (y-1)*w ] == old ){
todo[k] = x<<16 | y-1, k = k+1&m, field[ x + (y-1)*w ] = new;
}
if( x < xm && field[ x+1 + y*w ] == old ){
todo[k] = x+1<<16 | y, k = k+1&m, field[ x+1 + y*w ] = new;
}
if( y < ym && field[ x + (y+1)*w ] == old ){
todo[k] = x<<16 | y+1, k = k+1&m, field[ x + (y+1)*w ] = new;
}
}
}

return free( todo ), 0;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 22:09:51 2024

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even on
Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.

This comment makes me wonder if I should post my timing results.
Maybe I will (and including an appropriate disclaimer).

I do timings over these sizes:

25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768

with these patterns:

fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center

If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).

Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 21:55:22 2024

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

It seems that in worst case the strict FIFO algorithm is the same
as the rest of them, i.e. O(NN) where NN is the number of
re-colored points. Below is an example of the shape for which I
measured memory consumption for 3840x2160 image almost exactly 4x
as much as for 1920x1080.

I agree, the empirical evidence here and in my own tests is quite
compelling.

BTW, I am no longer agree with myself about "the rest of them".
By now, I know at least one method that is O(W*log(H)). It is even
quite fast for majority of my test shapes. Unfortunately, [in its
current form] it is abysmally slow (100x) for minority of tests.
[In it's current form] it has other disadvantages as well like
consuming non-trivial amount of memory when handling small spot in the
big image. But that can be improved. I am less sure that worst-case
speed can be improved enough to make it generally acceptable.

I think, I said enough for you to figure out a general principle of
this algorithm. I don't want to post code here before I try few improvements.

Thank you for the implied compliment. At this point I think the
probability that I will figure it out anytime soon is pretty low.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 22:38:59 2024

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

This topic has been more interesting that I was expecting, and
also more challenging.

That's not the first time in my practice where problems with
simple formulation begots interesting challenges.
Didn't Donald Knuth wrote 300 or 400 pages about sorting and
still ended up quite far away from exhausting the topic?

In my copy of volume 3 of TAOCP, the chapter on sorting takes up
388 pages. On the other hand, only 108 pages of that deals with
what we normally think of as sorting algorithms today, and even
that part is longer than it needs to be because of Knuth's
exhaustive (and exhausting) writing style. Don Knuth would
never write a book in the style of The C Programming Language.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu Apr 11 22:43:10 2024

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.

So why not break it down to smaller pieces ?

The better algorithms I have done are long and also make liberal
use of goto's. Maybe it isn't impossible to break one or more
of these algorithms into smaller pieces, but C doesn't make it
easy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Fri Apr 12 11:13:05 2024

On Thu, 11 Apr 2024 22:09:51 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even on
Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.

This comment makes me wonder if I should post my timing results.
Maybe I will (and including an appropriate disclaimer).

I do timings over these sizes:

25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768

I didn't went that far up (ended at 4K) and I only test landscape sizes.
May be, I'd add portrait option to see anisotropic behaviors.
For bigger sizes, correctness is interesting, speed - not so much, since
they are unlikely to be edited in interactive manner.

with these patterns:

fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center

If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).

Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.

My suit is about the same with following exceptions:
1. I didn't add donut yet
2. + 3 greeds with cell size 2, 3 and 4
3. + fractal tree
4. + entire field starting from corner
It seems, neither of us tests the cases in which linear dimensions of
the shape are much smaller than those of the field.

static void make_grid(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c, int cell_sz)
{
for (int y = 0; y < height; ++y) {
unsigned char* p = &image[y*width];
if (y % cell_sz == 0) {
memset(p, pen_c, width);
} else {
for (int x = 0; x < width; ++x)
p[x] = x % cell_sz ? background_c : pen_c;
}
}
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Fri Apr 12 11:59:25 2024

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.

I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sat Apr 13 08:30:03 2024

Michael S <[email protected]> writes:

On Thu, 11 Apr 2024 22:09:51 -0700
Tim Rentsch <[email protected]> wrote:

[...]

I do timings over these sizes:

25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768

I didn't went that far up (ended at 4K)

I test large sizes for three reasons. One, even if viewable
area is smaller, virtual displays might be much larger. Two,
to see how the algorithms scale. Three, larger areas have
relatively less influence from edge effects.

Also I have now added

275 x 25 25 x 275
400 x 300 300 x 400
640 x 480 480 x 640
1600 x 900 900 x 1600
16000 x 9000 9000 x 16000

and I only test landscape sizes. May be, I'd add portrait option
to see anisotropic behaviors.

I decided to do both, one, for symmetry (and there are still some
applications for portrait mode), and two, to see whether that has
an effect on behavior (indeed my latest algorithm is anisotropic,
so it is good to test the flipped sizes).

with these patterns:

fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center

If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).

Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.

My suit is about the same with following exceptions:
1. I didn't add donut yet
2. + 3 greeds with cell size 2, 3 and 4
3. + fractal tree

By "fractal" I meant fractal tree. Sorry if that was confusing.

4. + entire field starting from corner

I used to do that but took it out as redundant. I've added
it back now. :)

It seems, neither of us tests the cases in which linear dimensions
of the shape are much smaller than those of the field.

Shouldn't make a difference (for any of the algorithms shown) as
long as there is at least a 1 pixel border around the pattern.
Maybe I will add that variation (ick, a lot of work). By the
way the donut pattern already has a 1 pixel border, ie, does
not touch any edge.

static void make_grid(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c, int cell_sz)
{
for (int y = 0; y < height; ++y) {
unsigned char* p = &image[y*width];
if (y % cell_sz == 0) {
memset(p, pen_c, width);
} else {
for (int x = 0; x < width; ++x)
p[x] = x % cell_sz ? background_c : pen_c;
}
}
}

Ahh, this is what you meant by greed. A nice set of patterns.
I wrote a variation where the "line width" as well as the
"hole width" is variable, and added a bunch of those to my
tests (so a full timing suite now runs for several hours).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sat Apr 13 20:26:39 2024

On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.

I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.

This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post with
code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sat Apr 13 10:54:46 2024

Michael S <[email protected]> writes:

On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <[email protected]> wrote:

I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.

This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post
with code.

Does this help? Message-ID: <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sat Apr 13 23:11:59 2024

On Sat, 13 Apr 2024 10:54:46 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <[email protected]> wrote:

I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.

This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post
with code.

Does this help? Message-ID: <[email protected]>

Yes, it is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Wed Apr 17 00:47:22 2024

On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <[email protected]> wrote:

Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.

Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.

I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.

Finally found the time for speed measurements.

I tested four algorithms:
1. stack_2x2 - stack-like processing where each element is 2x2 rectangle
with Lot's wife amendment.
2. stack_timr1 - first variant of stack by Tim Rentsch
3. stack_timr2 - second variant of stack by Tim Rentsch
4. queue_timr - "take no prisoners" queue by Tim Rentsch, the one with power-of-two circular buffer, (x,y) packed to 32 bits and inner loop
optimized for solid shapes.

Tests were run on four CPUs
1. IVB - Intel Core i7-3570 at 3700 MHz. As far as CPUs are going,
rather old thing.
2. HSW - Intel Xeon E3-1271 v3 at 4000 MHz. Only couple of years
younger than above.
3. SKC - Intel Xeon E-2176G at 4250 MHz. Significantly younger, but microarchitecture exists since 2015.
4. ZN3 - AMD EPYC 7543P at 3700 MHz. The only one on my roaster whose microarchitecture can be considered relatively modern.

As you can see, with exception of the oldest CPU, your 2d stack variant
is not an improvement over the first.

What surprised me after I put all results together, was a poor showing
of SKC. I can't remember any other of my microbenchmarks (and I do
plenty) were this CPU was so decisively beaten by its older cousin.

The columns are as following:
1. Shape name
2. Starting point (x,y)
3. Number of points to recolor
4. total test duration, seconds
5. time per pixel - normalized to image area, nsec
6. time per pixel - normalized to number of points to recolor, nsec

IVB,stack_2x2:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.547 2.73 2.73
Solid square ( 0, 0) 475 0.537 2.68 2.68
standing snake-like shape ( 0, 0) 259 0.522 2.61 4.79
prostrate snake-like shape ( 0, 0) 259 0.528 2.64 4.84
slalom shape ( 0, 0) 233 0.459 2.29 4.68
slalom shape(rotated) ( 0, 0) 223 0.455 2.27 4.85
cross-in-cross ( 0, 0) 403 0.515 2.57 3.04
fractal tree ( 12, 0) 247 0.469 2.34 4.51
greed(2) ( 0, 0) 367 0.558 2.79 3.61
greed(3) ( 0, 0) 283 0.463 2.31 3.89
greed(4) ( 0, 0) 223 0.399 1.99 4.25
donut ( 23, 9) 238 0.305 1.52 3.04
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.461 2.30 2.30
Solid square ( 0, 0) 40000 0.460 2.30 2.30
standing snake-like shape ( 0, 0) 20100 0.382 1.91 3.80
prostrate snake-like shape ( 0, 0) 20100 0.474 2.37 4.71
slalom shape ( 0, 0) 19802 0.435 2.17 4.39
slalom shape(rotated) ( 0, 0) 19802 0.450 2.25 4.54
cross-in-cross ( 0, 0) 39216 0.470 2.35 2.40
fractal tree ( 99, 0) 18674 0.432 2.16 4.62
greed(2) ( 0, 0) 30000 0.458 2.29 3.05
greed(3) ( 0, 0) 22311 0.413 2.06 3.70
greed(4) ( 0, 0) 17500 0.348 1.74 3.98
donut ( 199, 100) 25830 0.315 1.57 2.44
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.450 2.24 2.24
Solid square ( 0, 0) 921600 0.450 2.24 2.24
standing snake-like shape ( 0, 0) 461160 0.371 1.85 3.69
prostrate snake-like shape ( 0, 0) 461440 0.469 2.33 4.66
slalom shape ( 0, 0) 460082 0.437 2.18 4.36
slalom shape(rotated) ( 0, 0) 460800 0.448 2.23 4.46
cross-in-cross ( 0, 0) 917616 0.452 2.25 2.26
fractal tree ( 639, 0) 445860 0.460 2.29 4.73
greed(2) ( 0, 0) 691200 0.468 2.33 3.11
greed(3) ( 0, 0) 512160 0.406 2.02 3.64
greed(4) ( 0, 0) 403200 0.344 1.71 3.91
donut (1279, 360) 655856 0.326 1.62 2.28
[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.453 2.23 2.23
Solid square ( 0, 0) 2073600 0.457 2.25 2.25
standing snake-like shape ( 0, 0) 1037340 0.374 1.84 3.68
prostrate snake-like shape ( 0, 0) 1037760 0.474 2.33 4.66
slalom shape ( 0, 0) 1036800 0.443 2.18 4.36
slalom shape(rotated) ( 0, 0) 1036800 0.452 2.22 4.45
cross-in-cross ( 0, 0) 2067616 0.457 2.25 2.26
fractal tree ( 959, 0) 1034612 0.453 2.23 4.47
greed(2) ( 0, 0) 1555200 0.450 2.21 2.95
greed(3) ( 0, 0) 1152000 0.407 2.00 3.61
greed(4) ( 0, 0) 907200 0.346 1.70 3.89
donut (1919, 540) 1477788 0.326 1.60 2.25
[3840 x 2160] * 26
Solid square (1920,1080) 8294400 0.500 2.32 2.32
Solid square ( 0, 0) 8294400 0.539 2.50 2.50
standing snake-like shape ( 0, 0) 4148280 0.449 2.08 4.16
prostrate snake-like shape ( 0, 0) 4149120 0.746 3.46 6.92
slalom shape ( 0, 0) 4147200 0.703 3.26 6.52
slalom shape(rotated) ( 0, 0) 4147200 0.537 2.49 4.98
cross-in-cross ( 0, 0) 8282416 0.518 2.40 2.41
fractal tree (1919, 0) 4135652 0.514 2.38 4.78
greed(2) ( 0, 0) 6220800 0.533 2.47 3.30
greed(3) ( 0, 0) 4608000 0.468 2.17 3.91
greed(4) ( 0, 0) 3628800 0.386 1.79 4.09
donut (3839,1080) 5919706 0.356 1.65 2.31

IVB,stack_timr1:
[25 x 19] * 421054
Solid square ( 12, 9) 475 1.132 5.66 5.66
Solid square ( 0, 0) 475 1.171 5.85 5.85
standing snake-like shape ( 0, 0) 259 0.724 3.62 6.64
prostrate snake-like shape ( 0, 0) 259 0.712 3.56 6.53
slalom shape ( 0, 0) 233 0.632 3.16 6.44
slalom shape(rotated) ( 0, 0) 223 0.632 3.16 6.73
cross-in-cross ( 0, 0) 403 0.931 4.65 5.49
fractal tree ( 12, 0) 247 0.537 2.68 5.16
greed(2) ( 0, 0) 367 0.866 4.33 5.60
greed(3) ( 0, 0) 283 0.724 3.62 6.08
greed(4) ( 0, 0) 223 0.618 3.09 6.58
donut ( 23, 9) 238 0.632 3.16 6.31
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.764 3.82 3.82
Solid square ( 0, 0) 40000 0.759 3.79 3.79
standing snake-like shape ( 0, 0) 20100 0.389 1.94 3.87
prostrate snake-like shape ( 0, 0) 20100 0.400 2.00 3.98
slalom shape ( 0, 0) 19802 0.388 1.94 3.92
slalom shape(rotated) ( 0, 0) 19802 0.388 1.94 3.92
cross-in-cross ( 0, 0) 39216 0.763 3.81 3.89
fractal tree ( 99, 0) 18674 0.372 1.86 3.98
greed(2) ( 0, 0) 30000 0.591 2.95 3.94
greed(3) ( 0, 0) 22311 0.445 2.22 3.99
greed(4) ( 0, 0) 17500 0.345 1.72 3.94
donut ( 199, 100) 25830 0.517 2.58 4.00
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.793 3.95 3.95
Solid square ( 0, 0) 921600 0.868 4.32 4.32
standing snake-like shape ( 0, 0) 461160 0.420 2.09 4.18
prostrate snake-like shape ( 0, 0) 461440 0.441 2.20 4.38
slalom shape ( 0, 0) 460082 0.434 2.16 4.33
slalom shape(rotated) ( 0, 0) 460800 0.429 2.14 4.27
cross-in-cross ( 0, 0) 917616 0.801 3.99 4.00
fractal tree ( 639, 0) 445860 0.389 1.94 4.00
greed(2) ( 0, 0) 691200 0.614 3.06 4.07
greed(3) ( 0, 0) 512160 0.424 2.11 3.80
greed(4) ( 0, 0) 403200 0.334 1.66 3.80
donut (1279, 360) 655856 0.572 2.85 4.00
[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.793 3.90 3.90
Solid square ( 0, 0) 2073600 0.909 4.47 4.47
standing snake-like shape ( 0, 0) 1037340 0.415 2.04 4.08
prostrate snake-like shape ( 0, 0) 1037760 0.442 2.18 4.35
slalom shape ( 0, 0) 1036800 0.431 2.12 4.24
slalom shape(rotated) ( 0, 0) 1036800 0.425 2.09 4.18
cross-in-cross ( 0, 0) 2067616 0.843 4.15 4.16
fractal tree ( 959, 0) 1034612 0.395 1.94 3.90
greed(2) ( 0, 0) 1555200 0.614 3.02 4.03
greed(3) ( 0, 0) 1152000 0.430 2.12 3.81
greed(4) ( 0, 0) 907200 0.341 1.68 3.84
donut (1919, 540) 1477788 0.571 2.81 3.94
[3840 x 2160] * 26
Solid square (1920,1080) 8294400 0.923 4.28 4.28
Solid square ( 0, 0) 8294400 1.109 5.14 5.14
standing snake-like shape ( 0, 0) 4148280 0.521 2.42 4.83
prostrate snake-like shape ( 0, 0) 4149120 1.186 5.50 10.99
slalom shape ( 0, 0) 4147200 0.938 4.35 8.70
slalom shape(rotated) ( 0, 0) 4147200 0.545 2.53 5.05
cross-in-cross ( 0, 0) 8282416 0.999 4.63 4.64
fractal tree (1919, 0) 4135652 0.433 2.01 4.03
greed(2) ( 0, 0) 6220800 0.738 3.42 4.56
greed(3) ( 0, 0) 4608000 0.529 2.45 4.42
greed(4) ( 0, 0) 3628800 0.417 1.93 4.42
donut (3839,1080) 5919706 0.666 3.09 4.33

IVB,stack_timr2:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.963 4.81 4.81
Solid square ( 0, 0) 475 0.990 4.95 4.95
standing snake-like shape ( 0, 0) 259 0.615 3.07 5.64
prostrate snake-like shape ( 0, 0) 259 0.673 3.36 6.17
slalom shape ( 0, 0) 233 0.761 3.80 7.76
slalom shape(rotated) ( 0, 0) 223 0.815 4.07 8.68
cross-in-cross ( 0, 0) 403 1.160 5.80 6.84
fractal tree ( 12, 0) 247 0.740 3.70 7.12
greed(2) ( 0, 0) 367 1.093 5.46 7.07
greed(3) ( 0, 0) 283 0.762 3.81 6.39
greed(4) ( 0, 0) 223 0.621 3.10 6.61
donut ( 23, 9) 238 0.753 3.76 7.51
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.587 2.93 2.93
Solid square ( 0, 0) 40000 0.588 2.94 2.94
standing snake-like shape ( 0, 0) 20100 0.299 1.49 2.97
prostrate snake-like shape ( 0, 0) 20100 0.311 1.55 3.09
slalom shape ( 0, 0) 19802 0.481 2.40 4.86
slalom shape(rotated) ( 0, 0) 19802 0.586 2.93 5.92
cross-in-cross ( 0, 0) 39216 0.609 3.04 3.10
fractal tree ( 99, 0) 18674 0.539 2.69 5.77
greed(2) ( 0, 0) 30000 0.909 4.54 6.06
greed(3) ( 0, 0) 22311 0.468 2.34 4.19
greed(4) ( 0, 0) 17500 0.371 1.85 4.24
donut ( 199, 100) 25830 0.418 2.09 3.24
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.602 3.00 3.00
Solid square ( 0, 0) 921600 0.741 3.69 3.69
standing snake-like shape ( 0, 0) 461160 0.326 1.62 3.24
prostrate snake-like shape ( 0, 0) 461440 0.342 1.70 3.40
slalom shape ( 0, 0) 460082 0.519 2.58 5.17
slalom shape(rotated) ( 0, 0) 460800 0.626 3.12 6.23
cross-in-cross ( 0, 0) 917616 0.666 3.31 3.33
fractal tree ( 639, 0) 445860 0.565 2.81 5.81
greed(2) ( 0, 0) 691200 0.938 4.67 6.23
greed(3) ( 0, 0) 512160 0.491 2.44 4.40
greed(4) ( 0, 0) 403200 0.352 1.75 4.00
donut (1279, 360) 655856 0.450 2.24 3.15
[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.611 3.01 3.01
Solid square ( 0, 0) 2073600 0.759 3.74 3.74
standing snake-like shape ( 0, 0) 1037340 0.330 1.62 3.25
prostrate snake-like shape ( 0, 0) 1037760 0.350 1.72 3.44
slalom shape ( 0, 0) 1036800 0.525 2.58 5.17
slalom shape(rotated) ( 0, 0) 1036800 0.636 3.13 6.26
cross-in-cross ( 0, 0) 2067616 0.674 3.32 3.33
fractal tree ( 959, 0) 1034612 0.605 2.98 5.97
greed(2) ( 0, 0) 1555200 0.923 4.54 6.06
greed(3) ( 0, 0) 1152000 0.463 2.28 4.10
greed(4) ( 0, 0) 907200 0.359 1.77 4.04
donut (1919, 540) 1477788 0.431 2.12 2.98
[3840 x 2160] * 26
Solid square (1920,1080) 8294400 0.703 3.26 3.26
Solid square ( 0, 0) 8294400 0.847 3.93 3.93
standing snake-like shape ( 0, 0) 4148280 0.400 1.85 3.71
prostrate snake-like shape ( 0, 0) 4149120 0.815 3.78 7.55
slalom shape ( 0, 0) 4147200 0.871 4.04 8.08
slalom shape(rotated) ( 0, 0) 4147200 0.734 3.40 6.81
cross-in-cross ( 0, 0) 8282416 0.774 3.59 3.59
fractal tree (1919, 0) 4135652 0.658 3.05 6.12
greed(2) ( 0, 0) 6220800 1.023 4.74 6.32
greed(3) ( 0, 0) 4608000 0.554 2.57 4.62
greed(4) ( 0, 0) 3628800 0.451 2.09 4.78
donut (3839,1080) 5919706 0.498 2.31 3.24

IVB,queue_timr:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.828 4.14 4.14
Solid square ( 0, 0) 475 0.890 4.45 4.45
standing snake-like shape ( 0, 0) 259 0.642 3.21 5.89
prostrate snake-like shape ( 0, 0) 259 0.709 3.54 6.50
slalom shape ( 0, 0) 233 0.589 2.94 6.00
slalom shape(rotated) ( 0, 0) 223 0.573 2.86 6.10
cross-in-cross ( 0, 0) 403 0.713 3.56 4.20
fractal tree ( 12, 0) 247 0.448 2.24 4.31
greed(2) ( 0, 0) 367 0.675 3.37 4.37
greed(3) ( 0, 0) 283 0.512 2.56 4.30
greed(4) ( 0, 0) 223 0.409 2.04 4.36
donut ( 23, 9) 238 0.439 2.19 4.38

[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.893 4.46 4.46
Solid square ( 0, 0) 40000 0.786 3.93 3.93
standing snake-like shape ( 0, 0) 20100 0.555 2.77 5.52
prostrate snake-like shape ( 0, 0) 20100 0.557 2.78 5.54
slalom shape ( 0, 0) 19802 0.571 2.85 5.76
slalom shape(rotated) ( 0, 0) 19802 0.548 2.74 5.53
cross-in-cross ( 0, 0) 39216 0.736 3.68 3.75
fractal tree ( 99, 0) 18674 0.569 2.84 6.09
greed(2) ( 0, 0) 30000 0.615 3.07 4.10
greed(3) ( 0, 0) 22311 0.453 2.26 4.06
greed(4) ( 0, 0) 17500 0.357 1.78 4.08
donut ( 199, 100) 25830 0.531 2.65 4.11

[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.785 3.91 3.91
Solid square ( 0, 0) 921600 0.761 3.79 3.79
standing snake-like shape ( 0, 0) 461160 0.551 2.74 5.48
prostrate snake-like shape ( 0, 0) 461440 0.552 2.75 5.49
slalom shape ( 0, 0) 460082 0.564 2.81 5.62
slalom shape(rotated) ( 0, 0) 460800 0.557 2.77 5.54
cross-in-cross ( 0, 0) 917616 0.755 3.76 3.77
fractal tree ( 639, 0) 445860 0.448 2.23 4.61
greed(2) ( 0, 0) 691200 0.645 3.21 4.28
greed(3) ( 0, 0) 512160 0.481 2.39 4.31
greed(4) ( 0, 0) 403200 0.377 1.88 4.29
donut (1279, 360) 655856 0.572 2.85 4.00

[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.854 4.20 4.20
Solid square ( 0, 0) 2073600 0.829 4.08 4.08
standing snake-like shape ( 0, 0) 1037340 0.557 2.74 5.48
prostrate snake-like shape ( 0, 0) 1037760 0.574 2.82 5.64
slalom shape ( 0, 0) 1036800 0.583 2.87 5.74
slalom shape(rotated) ( 0, 0) 1036800 0.563 2.77 5.54
cross-in-cross ( 0, 0) 2067616 0.822 4.05 4.06
fractal tree ( 959, 0) 1034612 0.468 2.30 4.62
greed(2) ( 0, 0) 1555200 0.664 3.27 4.36
greed(3) ( 0, 0) 1152000 0.483 2.38 4.28
greed(4) ( 0, 0) 907200 0.389 1.91 4.38
donut (1919, 540) 1477788 0.617 3.04 4.26

[3840 x 2160] * 26
Solid square (1920,1080) 8294400 1.407 6.52 6.52
Solid square ( 0, 0) 8294400 1.555 7.21 7.21
standing snake-like shape ( 0, 0) 4148280 0.596 2.76 5.53
prostrate snake-like shape ( 0, 0) 4149120 0.851 3.95 7.89
slalom shape ( 0, 0) 4147200 0.802 3.72 7.44
slalom shape(rotated) ( 0, 0) 4147200 0.600 2.78 5.56
cross-in-cross ( 0, 0) 8282416 1.522 7.06 7.07
fractal tree (1919, 0) 4135652 1.151 5.34 10.70
greed(2) ( 0, 0) 6220800 1.410 6.54 8.72
greed(3) ( 0, 0) 4608000 1.450 6.72 12.10
greed(4) ( 0, 0) 3628800 1.432 6.64 15.18
donut (3839,1080) 5919706 1.114 5.17 7.24

HSW,stack_2x2:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.391 1.95 1.95
Solid square ( 0, 0) 475 0.402 2.01 2.01
standing snake-like shape ( 0, 0) 259 0.389 1.94 3.57
prostrate snake-like shape ( 0, 0) 259 0.351 1.75 3.22
slalom shape ( 0, 0) 233 0.360 1.80 3.67
slalom shape(rotated) ( 0, 0) 223 0.366 1.83 3.90
cross-in-cross ( 0, 0) 403 0.385 1.92 2.27
fractal tree ( 12, 0) 247 0.382 1.91 3.67
greed(2) ( 0, 0) 367 0.416 2.08 2.69
greed(3) ( 0, 0) 283 0.361 1.80 3.03
greed(4) ( 0, 0) 223 0.306 1.53 3.26
donut ( 23, 9) 238 0.225 1.12 2.25
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.344 1.72 1.72
Solid square ( 0, 0) 40000 0.343 1.71 1.71
standing snake-like shape ( 0, 0) 20100 0.274 1.37 2.73
prostrate snake-like shape ( 0, 0) 20100 0.311 1.55 3.09
slalom shape ( 0, 0) 19802 0.333 1.66 3.36
slalom shape(rotated) ( 0, 0) 19802 0.344 1.72 3.47
cross-in-cross ( 0, 0) 39216 0.352 1.76 1.79
fractal tree ( 99, 0) 18674 0.497 2.48 5.32
greed(2) ( 0, 0) 30000 0.338 1.69 2.25
greed(3) ( 0, 0) 22311 0.317 1.58 2.84
greed(4) ( 0, 0) 17500 0.247 1.23 2.82
donut ( 199, 100) 25830 0.237 1.18 1.83
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.334 1.66 1.66
Solid square ( 0, 0) 921600 0.332 1.65 1.65
standing snake-like shape ( 0, 0) 461160 0.263 1.31 2.62
prostrate snake-like shape ( 0, 0) 461440 0.342 1.70 3.40
slalom shape ( 0, 0) 460082 0.352 1.75 3.51
slalom shape(rotated) ( 0, 0) 460800 0.346 1.72 3.44
cross-in-cross ( 0, 0) 917616 0.336 1.67 1.68
fractal tree ( 639, 0) 445860 0.437 2.18 4.50
greed(2) ( 0, 0) 691200 0.326 1.62 2.16
greed(3) ( 0, 0) 512160 0.303 1.51 2.71
greed(4) ( 0, 0) 403200 0.245 1.22 2.79
donut (1279, 360) 655856 0.243 1.21 1.70
[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.337 1.66 1.66
Solid square ( 0, 0) 2073600 0.335 1.65 1.65
standing snake-like shape ( 0, 0) 1037340 0.265 1.30 2.61
prostrate snake-like shape ( 0, 0) 1037760 0.333 1.64 3.27
slalom shape ( 0, 0) 1036800 0.344 1.69 3.39
slalom shape(rotated) ( 0, 0) 1036800 0.342 1.68 3.37
cross-in-cross ( 0, 0) 2067616 0.338 1.66 1.67
fractal tree ( 959, 0) 1034612 0.472 2.32 4.66
greed(2) ( 0, 0) 1555200 0.328 1.61 2.15
greed(3) ( 0, 0) 1152000 0.305 1.50 2.70
greed(4) ( 0, 0) 907200 0.245 1.21 2.76
donut (1919, 540) 1477788 0.244 1.20 1.68
[3840 x 2160] * 26
Solid square (1920,1080) 8294400 0.375 1.74 1.74
Solid square ( 0, 0) 8294400 0.402 1.86 1.86
standing snake-like shape ( 0, 0) 4148280 0.323 1.50 2.99
prostrate snake-like shape ( 0, 0) 4149120 0.561 2.60 5.20
slalom shape ( 0, 0) 4147200 0.574 2.66 5.32
slalom shape(rotated) ( 0, 0) 4147200 0.407 1.89 3.77
cross-in-cross ( 0, 0) 8282416 0.384 1.78 1.78
fractal tree (1919, 0) 4135652 0.508 2.36 4.72
greed(2) ( 0, 0) 6220800 0.395 1.83 2.44
greed(3) ( 0, 0) 4608000 0.350 1.62 2.92
greed(4) ( 0, 0) 3628800 0.275 1.28 2.91
donut (3839,1080) 5919706 0.262 1.21 1.70

HSW,stack_timr1:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.801 4.00 4.00
Solid square ( 0, 0) 475 0.845 4.22 4.22
standing snake-like shape ( 0, 0) 259 0.511 2.55 4.69
prostrate snake-like shape ( 0, 0) 259 0.516 2.58 4.73
slalom shape ( 0, 0) 233 0.520 2.60 5.30
slalom shape(rotated) ( 0, 0) 223 0.476 2.38 5.07
cross-in-cross ( 0, 0) 403 0.694 3.47 4.09
fractal tree ( 12, 0) 247 0.414 2.07 3.98
greed(2) ( 0, 0) 367 0.645 3.22 4.17
greed(3) ( 0, 0) 283 0.552 2.76 4.63
greed(4) ( 0, 0) 223 0.476 2.38 5.07
donut ( 23, 9) 238 0.469 2.34 4.68
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.447 2.23 2.23
Solid square ( 0, 0) 40000 0.444 2.22 2.22
standing snake-like shape ( 0, 0) 20100 0.229 1.14 2.28
prostrate snake-like shape ( 0, 0) 20100 0.250 1.25 2.49
slalom shape ( 0, 0) 19802 0.270 1.35 2.73
slalom shape(rotated) ( 0, 0) 19802 0.260 1.30 2.62
cross-in-cross ( 0, 0) 39216 0.459 2.29 2.34
fractal tree ( 99, 0) 18674 0.260 1.30 2.78
greed(2) ( 0, 0) 30000 0.387 1.93 2.58
greed(3) ( 0, 0) 22311 0.295 1.47 2.64
greed(4) ( 0, 0) 17500 0.231 1.15 2.64
donut ( 199, 100) 25830 0.316 1.58 2.45
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.457 2.27 2.27
Solid square ( 0, 0) 921600 0.515 2.56 2.56
standing snake-like shape ( 0, 0) 461160 0.248 1.23 2.47
prostrate snake-like shape ( 0, 0) 461440 0.321 1.60 3.19
slalom shape ( 0, 0) 460082 0.312 1.55 3.11
slalom shape(rotated) ( 0, 0) 460800 0.285 1.42 2.84
cross-in-cross ( 0, 0) 917616 0.466 2.32 2.33
fractal tree ( 639, 0) 445860 0.271 1.35 2.79
greed(2) ( 0, 0) 691200 0.406 2.02 2.69
greed(3) ( 0, 0) 512160 0.278 1.38 2.49
greed(4) ( 0, 0) 403200 0.223 1.11 2.54
donut (1279, 360) 655856 0.335 1.67 2.34
[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.454 2.23 2.23
Solid square ( 0, 0) 2073600 0.554 2.73 2.73
standing snake-like shape ( 0, 0) 1037340 0.240 1.18 2.36
prostrate snake-like shape ( 0, 0) 1037760 0.317 1.56 3.12
slalom shape ( 0, 0) 1036800 0.306 1.51 3.01
slalom shape(rotated) ( 0, 0) 1036800 0.293 1.44 2.88
cross-in-cross ( 0, 0) 2067616 0.511 2.51 2.52
fractal tree ( 959, 0) 1034612 0.276 1.36 2.72
greed(2) ( 0, 0) 1555200 0.402 1.98 2.64
greed(3) ( 0, 0) 1152000 0.283 1.39 2.51
greed(4) ( 0, 0) 907200 0.224 1.10 2.52
donut (1919, 540) 1477788 0.331 1.63 2.29
[3840 x 2160] * 26
Solid square (1920,1080) 8294400 0.566 2.62 2.62
Solid square ( 0, 0) 8294400 0.708 3.28 3.28
standing snake-like shape ( 0, 0) 4148280 0.337 1.56 3.12
prostrate snake-like shape ( 0, 0) 4149120 0.930 4.31 8.62
slalom shape ( 0, 0) 4147200 0.735 3.41 6.82
slalom shape(rotated) ( 0, 0) 4147200 0.387 1.79 3.59
cross-in-cross ( 0, 0) 8282416 0.630 2.92 2.93
fractal tree (1919, 0) 4135652 0.302 1.40 2.81
greed(2) ( 0, 0) 6220800 0.516 2.39 3.19
greed(3) ( 0, 0) 4608000 0.367 1.70 3.06
greed(4) ( 0, 0) 3628800 0.286 1.33 3.03
donut (3839,1080) 5919706 0.398 1.85 2.59

HSW,stack_timr2:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.746 3.73 3.73
Solid square ( 0, 0) 475 0.781 3.90 3.90
standing snake-like shape ( 0, 0) 259 0.479 2.39 4.39
prostrate snake-like shape ( 0, 0) 259 0.525 2.62 4.81
slalom shape ( 0, 0) 233 0.546 2.73 5.57
slalom shape(rotated) ( 0, 0) 223 0.532 2.66 5.67
cross-in-cross ( 0, 0) 403 0.804 4.02 4.74
fractal tree ( 12, 0) 247 0.466 2.33 4.48
greed(2) ( 0, 0) 367 0.668 3.34 4.32
greed(3) ( 0, 0) 283 0.552 2.76 4.63
greed(4) ( 0, 0) 223 0.446 2.23 4.75
donut ( 23, 9) 238 0.521 2.60 5.20
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.432 2.16 2.16
Solid square ( 0, 0) 40000 0.430 2.15 2.15
standing snake-like shape ( 0, 0) 20100 0.223 1.11 2.22
prostrate snake-like shape ( 0, 0) 20100 0.271 1.35 2.70
slalom shape ( 0, 0) 19802 0.360 1.80 3.63
slalom shape(rotated) ( 0, 0) 19802 0.385 1.92 3.89
cross-in-cross ( 0, 0) 39216 0.468 2.34 2.39
fractal tree ( 99, 0) 18674 0.457 2.28 4.89
greed(2) ( 0, 0) 30000 0.468 2.34 3.12
greed(3) ( 0, 0) 22311 0.353 1.76 3.16
greed(4) ( 0, 0) 17500 0.275 1.37 3.14
donut ( 199, 100) 25830 0.340 1.70 2.63
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.450 2.24 2.24
Solid square ( 0, 0) 921600 0.560 2.79 2.79
standing snake-like shape ( 0, 0) 461160 0.244 1.21 2.43
prostrate snake-like shape ( 0, 0) 461440 0.282 1.40 2.80
slalom shape ( 0, 0) 460082 0.412 2.05 4.11
slalom shape(rotated) ( 0, 0) 460800 0.414 2.06 4.12
cross-in-cross ( 0, 0) 917616 0.497 2.47 2.48
fractal tree ( 639, 0) 445860 0.426 2.12 4.38
greed(2) ( 0, 0) 691200 0.496 2.47 3.29
greed(3) ( 0, 0) 512160 0.373 1.86 3.34
greed(4) ( 0, 0) 403200 0.282 1.40 3.21
donut (1279, 360) 655856 0.312 1.55 2.18
[1920 x 1080] * 98
Solid square ( 960, 540) 2073600 0.437 2.15 2.15
Solid square ( 0, 0) 2073600 0.559 2.75 2.75
standing snake-like shape ( 0, 0) 1037340 0.235 1.16 2.31
prostrate snake-like shape ( 0, 0) 1037760 0.274 1.35 2.69
slalom shape ( 0, 0) 1036800 0.396 1.95 3.90
slalom shape(rotated) ( 0, 0) 1036800 0.407 2.00 4.01
cross-in-cross ( 0, 0) 2067616 0.490 2.41 2.42
fractal tree ( 959, 0) 1034612 0.457 2.25 4.51
greed(2) ( 0, 0) 1555200 0.486 2.39 3.19
greed(3) ( 0, 0) 1152000 0.346 1.70 3.06
greed(4) ( 0, 0) 907200 0.268 1.32 3.01
donut (1919, 540) 1477788 0.297 1.46 2.05
[3840 x 2160] * 26
Solid square (1920,1080) 8294400 0.523 2.43 2.43
Solid square ( 0, 0) 8294400 0.649 3.01 3.01
standing snake-like shape ( 0, 0) 4148280 0.307 1.42 2.85
prostrate snake-like shape ( 0, 0) 4149120 0.614 2.85 5.69
slalom shape ( 0, 0) 4147200 0.666 3.09 6.18
slalom shape(rotated) ( 0, 0) 4147200 0.495 2.30 4.59
cross-in-cross ( 0, 0) 8282416 0.585 2.71 2.72
fractal tree (1919, 0) 4135652 0.435 2.02 4.05
greed(2) ( 0, 0) 6220800 0.583 2.70 3.60
greed(3) ( 0, 0) 4608000 0.426 1.98 3.56
greed(4) ( 0, 0) 3628800 0.342 1.59 3.62
donut (3839,1080) 5919706 0.369 1.71 2.40

HSW,queue_timr:
[25 x 19] * 421054
Solid square ( 12, 9) 475 0.698 3.49 3.49
Solid square ( 0, 0) 475 0.709 3.54 3.54
standing snake-like shape ( 0, 0) 259 0.517 2.58 4.74
prostrate snake-like shape ( 0, 0) 259 0.518 2.59 4.75
slalom shape ( 0, 0) 233 0.478 2.39 4.87
slalom shape(rotated) ( 0, 0) 223 0.447 2.23 4.76
cross-in-cross ( 0, 0) 403 0.577 2.88 3.40
fractal tree ( 12, 0) 247 0.374 1.87 3.60
greed(2) ( 0, 0) 367 0.515 2.57 3.33
greed(3) ( 0, 0) 283 0.409 2.04 3.43
greed(4) ( 0, 0) 223 0.336 1.68 3.58
donut ( 23, 9) 238 0.379 1.89 3.78
[200 x 200] * 5002
Solid square ( 100, 100) 40000 0.662 3.31 3.31
Solid square ( 0, 0) 40000 0.619 3.09 3.09
standing snake-like shape ( 0, 0) 20100 0.443 2.21 4.41
prostrate snake-like shape ( 0, 0) 20100 0.446 2.23 4.44
slalom shape ( 0, 0) 19802 0.440 2.20 4.44
slalom shape(rotated) ( 0, 0) 19802 0.439 2.19 4.43
cross-in-cross ( 0, 0) 39216 0.629 3.14 3.21
fractal tree ( 99, 0) 18674 0.618 3.09 6.62
greed(2) ( 0, 0) 30000 0.477 2.38 3.18
greed(3) ( 0, 0) 22311 0.364 1.82 3.26
greed(4) ( 0, 0) 17500 0.289 1.44 3.30
donut ( 199, 100) 25830 0.482 2.41 3.73
[1280 x 720] * 218
Solid square ( 640, 360) 921600 0.669 3.33 3.33
Solid square ( 0, 0) 921600 0.628 3.13 3.13
standing snake-like shape ( 0, 0) 461160 0.444 2.21 4.42
prostrate snake-like shape ( 0, 0) 461440 0.445 2.21 4.42
slalom shape ( 0, 0) 460082 0.444 2.21 4.43

[continued in next message]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed Apr 17 10:47:25 2024

Michael S <[email protected]> writes:

[...]

Finally found the time for speed measurements. [...]

I got these. Thank you.

The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.

The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.

Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Wed Apr 17 22:41:26 2024

On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Finally found the time for speed measurements. [...]

I got these. Thank you.

The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.

The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.

What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your test
hardware.

Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.

I am still interested in combination of speed that does not suck
with O(N) worst-case memory footprint.
I already have couple of variants of the former, but so far they are
all unreasonably slow - ~5 times slower than the best.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Fri Apr 19 14:59:20 2024

Michael S <[email protected]> writes:

On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Finally found the time for speed measurements. [...]

I got these. Thank you.

The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.

The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.

What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.

I think trying to look at those wouldn't tell me anything
helpful. Too many unknowns. And still no way to test or
measure any changes to the various algorithms.

Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.

I am still interested in combination of speed that does
not suck with O(N) worst-case memory footprint.
I already have couple of variants of the former,

Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?

I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.

but so
far they are all unreasonably slow - ~5 times slower than
the best.

I'm no longer working on the problem but I'm interested to
hear what you come up with.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sat Apr 20 21:10:23 2024

On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Finally found the time for speed measurements. [...]

I got these. Thank you.

The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.

The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.

What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.

I think trying to look at those wouldn't tell me anything
helpful. Too many unknowns. And still no way to test or
measure any changes to the various algorithms.

Frankly, I don't understand.
If you have troubles with testing on shared hardware then you can
always test on the hardware that you own and has full control.
Even if it is a little old, the trends tend to be the same. At least I
clearly see the same trends on my almost 12 y.o. home PC and on
relatively modern EPYC3.

Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.

I am still interested in combination of speed that does
not suck with O(N) worst-case memory footprint.
I already have couple of variants of the former,

Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?

I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.

No, I mean O(max(M,N)) plus possibly some logarithmic component that
loses significance when images grow bigger.
More so, if bounding rectangle of the shape is A x B then I'd like
memory requirements to be O(max(A,B)), but so far it does not appear to
be possible, or at least not possible without significant complications
and further slowdown. So, as an intermediate goal I am willing to
accept that allocation would be O(max(M,N)). but amount of touched
memory is O(max(A,B)).

but so
far they are all unreasonably slow - ~5 times slower than
the best.

I'm no longer working on the problem but I'm interested to
hear what you come up with.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Thu Apr 25 17:56:06 2024

On Sat, 20 Apr 2024 21:10:23 +0300
Michael S <[email protected]> wrote:

On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <[email protected]> wrote:

Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?

I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.

No, I mean O(max(M,N)) plus possibly some logarithmic component that
loses significance when images grow bigger.
More so, if bounding rectangle of the shape is A x B then I'd like
memory requirements to be O(max(A,B)), but so far it does not appear
to be possible, or at least not possible without significant
complications and further slowdown. So, as an intermediate goal I am
willing to accept that allocation would be O(max(M,N)). but amount of
touched memory is O(max(A,B)).

but so
far they are all unreasonably slow - ~5 times slower than
the best.

I'm no longer working on the problem but I'm interested to
hear what you come up with.

Here is what I had in mind.
I tried to optimize as little as I can in order to make it as simple
as I can. Unfortunately, I am not particularly good at it, so, code
still contains few unnecessary "tricks" that make understanding a
little harder.
The code uses VLA and recursion for the same purpose of making it less
tricky.
If desired, the memory footprint could be easily reduced by factor of 8
through use of packed bit arrays instead arrays of _Bool.

Even in this relatively crude form for majority of shapes this code is blazingly fast.
Unfortunately, in the worst case (both 'slalom' shapes) an execution
time is O(max(A,B)**3) which makes it unfit as general-purpose routine.
At the moment I don't see a solution for this problem. Overall, it's
probably a dead end.

#include <stddef.h>
#include <string.h>

typedef unsigned char Color;

struct floodfill4_state {
Color* image;
ptrdiff_t width;
_Bool *l_todo, *r_todo, *u_todo, *d_todo;
int nx, ny;
int x, y;
Color old_color, new_color;
};

enum {
more_r = 1, more_l = 2, more_d = 4, more_u = 8,
more_lr = more_r+more_l, more_ud=more_u+more_d,
};

static
int floodfill4_expand_lr(struct floodfill4_state* s, int exp_x,
_Bool* src_todo, _Bool* exp_todo, int lr);
static
int floodfill4_expand_ud(struct floodfill4_state* s, int exp_x,
_Bool* src_todo, _Bool* exp_todo, int ud);

int floodfill4(Color* image, int width, int height, int x, int y,
Color old_color, Color new_color)
{
if (width <= 0 || height <= 0)
return 0;

if (x < 0 || x >= width || y < 0 || y >= height)
return 0;

Color* beg = &image[(size_t)width*y+x];
if (*beg != old_color)
return 0;

*beg = new_color;
// Color* last_row = &image[(size_t)width*(height-1)];
_Bool lr_todo[2][height];
_Bool ud_todo[2][width];

struct floodfill4_state s = {
.image = beg,
.width = width,
.l_todo = &lr_todo[0][y],
.r_todo = &lr_todo[1][y],
.u_todo = &ud_todo[0][x],
.d_todo = &ud_todo[1][x],
.x = 0, .y = 0, .nx = 1, .ny = 1,
.old_color = old_color,
.new_color = new_color,
};
*s.l_todo = *s.r_todo = *s.u_todo = *s.d_todo = 1;

// expansion loop
for (int more = more_lr+more_ud; more != 0;) {
if (more & more_lr) {
_Bool exp_todo[s.ny];
do {
if (more & more_r) {
while (x+s.nx != width) {
// try to expand to the right
s.x = s.nx-1;
int ret = floodfill4_expand_lr(&s, s.nx, s.r_todo,
exp_todo, more_r);
if (!ret)
break;
more |= ret;
++s.nx;
}
more &= ~more_r;
}
if (more & more_l) {
while (x != 0) {
// try to expand to the left
s.x = 0;
int ret = floodfill4_expand_lr(&s, -1, s.l_todo, exp_todo,
more_l);
if (!ret)
break;
more |= ret;
++s.nx;
--s.image;
--s.u_todo;
--s.d_todo;
--x;
}
more &= ~more_l;
}
} while (more & more_lr);
}

if (more & more_ud) {
_Bool exp_todo[s.nx];
do {
if (more & more_d) {
while (y+s.ny != height) {
// try to expand down
s.y = s.ny-1;
int ret = floodfill4_expand_ud(&s, s.ny, s.d_todo,
exp_todo, more_d);
if (!ret)
break;
more |= ret;
++s.ny;
}
more &= ~more_d;
}
if (more & more_u) {
while (y != 0) {
// try to expand up
s.y = 0;
int ret = floodfill4_expand_ud(&s, -1, s.u_todo, exp_todo,
more_u);
if (!ret)
break;
more |= ret;
++s.ny;
s.image -= s.width;
--s.l_todo;
--s.r_todo;
--y;
}
more &= ~more_u;
}
} while (more & more_ud);
}
}
return 1;
}

// floodfill4_core - floodfill4 recursively in divide and conquer
fashion
// s.*-todo arrays initialized by caller. floodfill4_core sets values
// in that indicate need for further action, but never clears values
// that were already set
static void floodfill4_core(const struct floodfill4_state* arg)
{
const int nx = arg->nx;
const int ny = arg->ny;
if (nx+ny == 2) { // nx==ny==1
*arg->l_todo = *arg->r_todo = *arg->u_todo = *arg->d_todo = 1;
*arg->image = arg->new_color;
return;
}

struct floodfill4_state args[2];
args[0] = args[1] = *arg;
if (nx > ny) {
// split vertically
_Bool todo[2][ny];
const int hx = nx / 2;

args[0].r_todo = todo[0];
args[0].nx = hx;

args[1].image += hx;
args[1].l_todo = todo[1];
args[1].u_todo += hx;
args[1].d_todo += hx;
args[1].nx = nx-hx;

int todo_i;
int x0 = arg->x;
if (x0 < hx) { // update left field
memset(todo[0], 0, ny*sizeof(todo[0][0]));
floodfill4_core(&args[0]);
todo_i = 0;
} else { // update right field
memset(todo[1], 0, ny*sizeof(todo[0][0]));
args[1].x = x0 - hx;
floodfill4_core(&args[1]);
todo_i = 1;
}

args[0].x = hx-1;
args[1].x = 0;
for (;;) {
// look for contact points on destination edge
_Bool *todo_src = todo[todo_i];
Color *edge_dst = &arg->image[hx-todo_i];
int y;
for (y = 0; y < ny; edge_dst += arg->width, ++y) {
if (todo_src[y] && *edge_dst == arg->old_color) // contact found
break;
}
if (y == ny)
break;

todo_i = 1 - todo_i;
memset(todo[todo_i], 0, ny*sizeof(todo[0][0]));
do {
args[todo_i].y = y;
floodfill4_core(&args[todo_i]);
edge_dst += arg->width;
for (y = y+1; y < ny; edge_dst += arg->width, ++y) {
if (todo_src[y] && *edge_dst == arg->old_color) // contact
found
break;
}
} while (y < ny);
}
} else { // ny >= nx
// split horizontally
_Bool todo[2][nx];
const int hy = ny / 2;
Color* edge = &arg->image[arg->width*hy];

args[0].d_todo = todo[0];
args[0].ny = hy;

args[1].image = edge;
args[1].u_todo = todo[1];
args[1].l_todo += hy;
args[1].r_todo += hy;
args[1].ny = ny-hy;

int todo_i;
int y0 = arg->y;
if (y0 < hy) { // update up field
memset(todo[0], 0, nx*sizeof(todo[0][0]));
floodfill4_core(&args[0]);
todo_i = 0;
} else { // update down field
args[1].y = y0 - hy;
memset(todo[1], 0, nx*sizeof(todo[0][0]));
floodfill4_core(&args[1]);
todo_i = 1;
}

args[0].y = hy-1;
args[1].y = 0;
for (;;) {
// look for contact points on destination edge
_Bool *todo_src = todo[todo_i];
Color *edge_dst = todo_i ? edge - arg->width : edge;
int x;
for (x = 0; x < nx; ++x) {
if (todo_src[x] && edge_dst[x] == arg->old_color) // contact
found
break;
}
if (x == nx)
break;

todo_i = 1 - todo_i;
memset(todo[todo_i], 0, nx*sizeof(todo[0][0]));
do {
args[todo_i].x = x;
floodfill4_core(&args[todo_i]);
for (x = x+1; x < nx; ++x) {
if (todo_src[x] && edge_dst[x] == arg->old_color) // contact
found
break;
}
} while (x < nx);
}
}
}

// return value
// 0 - not expanded
// 1 - expanded, no bounce back
// 2 - expanded, possible bounce back
static
int floodfill4_expand(
Color* pixels, // row or column
ptrdiff_t incr, // distance between adjacent points of pixels
int len,
Color old_color,
Color new_color,
_Bool* src_todo,
_Bool* dst_todo,
_Bool first)
{
for (int i = 0; i < len; pixels += incr, ++i) {
if (src_todo[i] && *pixels == old_color) {
// contact found
if (first)
memset(dst_todo, 0, len*sizeof(*dst_todo));
*pixels = new_color;
dst_todo[i] = 1;
Color* p = pixels - incr;
int k;
for (k = i-1; k >= 0 && *p == old_color; p -= incr, --k) {
*p = new_color;
dst_todo[k] = 1;
}
_Bool more = k != i-1;
for (;;) {
pixels += incr;
for (i = i+1; i < len && *pixels == old_color; pixels += incr,
++i) {
*pixels = new_color;
dst_todo[i] = 1;
more |= src_todo[i] ^ 1;
}
if (i >= len)
break;
pixels += incr;
for (i = i+1; i < len && (!src_todo[i] || *pixels !=
old_color); pixels += incr, ++i);
if (i >= len)
break;
*pixels = new_color;
dst_todo[i] = 1;
Color* p = pixels - incr;
for (k = i-1; *p == old_color; --k, p -= incr) {
*p = new_color;
dst_todo[k] = 1;
}
more |= k != i-1;
}
return more ? 2 : 1;
}
}
return 0; // not expended
}

// return value - more code
static
int floodfill4_expand_lr(struct floodfill4_state* s, int exp_x, _Bool* src_todo, _Bool* exp_todo, int lr)
{
// try to expand to the right or left
const int ny = s->ny;
int ret = floodfill4_expand(&s->image[exp_x], s->width, ny,

old_color, s->new_color, src_todo, exp_todo, 1);

if (!ret)
return 0;

int result = lr;
while (ret == 2) {
Color* p = &s->image[s->x];
_Bool contact = 0;
for (int y = 0; y < ny; p += s->width, ++y) {
if (exp_todo[y] && *p == s->old_color) {
if (!contact)
memset(src_todo, 0, ny*sizeof(*src_todo));
s->y = y;
floodfill4_core(s);
contact = 1;
}
}
if (!contact)
break;
result = more_lr+more_ud;
ret = floodfill4_expand(&s->image[exp_x], s->width, ny,
s->old_color, s->new_color, src_todo, exp_todo, 0);
}

if ((s->u_todo[exp_x] = exp_todo[0])) result |= more_u;
if ((s->d_todo[exp_x] = exp_todo[ny-1])) result |= more_d;
memcpy(src_todo, exp_todo, ny*sizeof(*src_todo));
return result;
}

// return value - more code
static
int floodfill4_expand_ud(struct floodfill4_state* s, int exp_y, _Bool* src_todo, _Bool* exp_todo, int ud)
{
// try to expand up or down
const int nx = s->nx;
int ret = floodfill4_expand(&s->image[s->width*exp_y], 1, nx,

old_color, s->new_color, src_todo, exp_todo, 1);

if (!ret)
return 0;

int result = ud;
while (ret == 2) {
Color* p = &s->image[s->width*s->y];
_Bool contact = 0;
for (int x = 0; x < nx; ++x) {
if (exp_todo[x] && p[x] == s->old_color) {
if (!contact)
memset(src_todo, 0, nx*sizeof(*src_todo));
s->x = x;
floodfill4_core(s);
contact = 1;
}
}
if (!contact)
break;
result = more_lr+more_ud;
ret = floodfill4_expand(&s->image[s->width*exp_y], 1, nx,
s->old_color, s->new_color, src_todo, exp_todo, 0);
}

if ((s->l_todo[exp_y] = exp_todo[0])) result |= more_l;
if ((s->r_todo[exp_y] = exp_todo[nx-1])) result |= more_r;
memcpy(src_todo, exp_todo, nx*sizeof(*src_todo));
return result;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Fri May 3 18:33:05 2024

On Thu, 25 Apr 2024 17:56:06 +0300
Michael S <[email protected]> wrote:

On Sat, 20 Apr 2024 21:10:23 +0300
Michael S <[email protected]> wrote:

On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <[email protected]> wrote:

Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?

I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.

No, I mean O(max(M,N)) plus possibly some logarithmic component that
loses significance when images grow bigger.
More so, if bounding rectangle of the shape is A x B then I'd like
memory requirements to be O(max(A,B)), but so far it does not appear
to be possible, or at least not possible without significant
complications and further slowdown. So, as an intermediate goal I am willing to accept that allocation would be O(max(M,N)). but amount
of touched memory is O(max(A,B)).

but so
far they are all unreasonably slow - ~5 times slower than
the best.

I'm no longer working on the problem but I'm interested to
hear what you come up with.

Here is what I had in mind.
I tried to optimize as little as I can in order to make it as simple
as I can. Unfortunately, I am not particularly good at it, so, code
still contains few unnecessary "tricks" that make understanding a
little harder.
The code uses VLA and recursion for the same purpose of making it less tricky.
If desired, the memory footprint could be easily reduced by factor of
8 through use of packed bit arrays instead arrays of _Bool.

Even in this relatively crude form for majority of shapes this code is blazingly fast.
Unfortunately, in the worst case (both 'slalom' shapes) an execution
time is O(max(A,B)**3) which makes it unfit as general-purpose
routine. At the moment I don't see a solution for this problem.
Overall, it's probably a dead end.

A solution (sort of) is in line with the famous quite of David Wheeler
- to turn todo lists from bit maps into arrays of
abscesses-or-ordinates of contact points.

The cost is a memory footprint - 4x bigger than the previous version, 32
times bigger than above-mentioned "packed" variant of the previous
version. But in BigO sense it's the same.

In my tests it reduced the worst case time from O(max(A,B)**3) to O(A*B*log(max(A,B)). Which is non-ideal, but probably acceptable,
because the bad cases should be very rare in practice.

The real trouble is different - I don't know if my "worst case" is
really the worst.

The code below is for presentation of algorithm in both clear and
compact manner, with emphasis on symmetry between x and y directions.
It is not optimal in any sense and can be made no-trivially faster both
by algorithm enhancements an by specialization of critical loops.

#include <stddef.h>
#include <string.h>

typedef unsigned char Color;

enum coordinate_axes {
x_i = 0, y_i, // index of pos[], ld[], 1st index of limits[][],
todo[][] };
enum from_to {
from_i = 0, to_i // 2nd index of limits[][], todo[][], I use 0 and 1
more commonly };
enum { // indices of todo[] lists
le_i = x_i*2+from_i, ri_i = x_i*2+to_i,
up_i = y_i*2+from_i, dn_i = y_i*2+to_i,
};

#define IDX2INC(ft_idx) ((int)(ft_idx)*2 - 1)
#define X2Y(axis) ((axis) ^ 1)

typedef struct {
Color* image;
Color old_color, new_color;
ptrdiff_t ld[2]; // {1, width}
int limits[2][2]; // {{0, width-1}, {0, height-1}
} floodfill4_param;

typedef struct {
int *todo[4]; // {left,right, up, down} - first item holds the #
of active entries int limits[2][2]; // {{x0, x1}, {y0, y1}};
int pos[2]; // {x, y}
} floodfill4_state;

static void floodfill4_core(
const floodfill4_param* prm,
const floodfill4_state* arg);
static _Bool floodfill4_expand(
const floodfill4_param* prm,
floodfill4_state* s,
enum coordinate_axes axis, enum from_to ft_idx);

int floodfill4(
Color* image,
int width, int height,
int x, int y,
Color old_color, Color new_color)
{
if (width <= 0 || height <= 0)
return 0;

if (x < 0 || x >= width || y < 0 || y >= height)
return 0;

if (image[(size_t)width*y+x] != old_color)
return 0;

int lr_todo[2][height+1];
int ud_todo[2][width+1];
floodfill4_param prm = {
.image = image,
.ld[x_i] = 1,
.ld[y_i] = width,
.limits = {{ 0, width-1}, {0, height-1}},
.old_color = old_color,
.new_color = new_color,
};
floodfill4_state s = {
.todo[le_i] = lr_todo[0],
.todo[ri_i] = lr_todo[1],
.todo[up_i] = ud_todo[0],
.todo[dn_i] = ud_todo[1],
.limits[x_i] = {x, x},
.limits[y_i] = {y, y},
.pos[x_i] = x, .pos[y_i] = y,
};
for (int i = 0; i < 4; ++i)
*s.todo[i] = 0;

// process central 1x1 rectangle
floodfill4_core(&prm, &s);

// expansion loop
for (unsigned idx = 0; idx < 4;) {
if (floodfill4_expand(&prm, &s, idx/2, idx % 2)) { // try to expand
idx = 0; // expansion succeed - restart from beginning
continue;
}
++idx;
}

return 1;
}

static __inline
void floodfill4_add(int* list, int val)
{
int n = list[0];
list[n+1] = val;
list[0] = n + 1;
}

// floodfill4_core - floodfill4 recursively in divide and conquer
fashion // arg->*_todo arrays (lists) initialized by caller.
// floodfill4_core adds to *_todo values that indicate need for further
// action, but never removes anything
static void floodfill4_core(const floodfill4_param* prm, const floodfill4_state* arg) {
const int ni[2] = {
arg->limits[x_i][1]-arg->limits[x_i][0],
arg->limits[y_i][1]-arg->limits[y_i][0],
};
if (ni[x_i] + ni[y_i] == 0) { // nx==ny==1
prm->image[prm->ld[y_i]*arg->limits[y_i][0]+arg->limits[x_i][0]] = prm->new_color; floodfill4_add(arg->todo[le_i], arg->limits[y_i][0]);
floodfill4_add(arg->todo[ri_i], arg->limits[y_i][0]);
floodfill4_add(arg->todo[up_i], arg->limits[x_i][0]);
floodfill4_add(arg->todo[dn_i], arg->limits[x_i][0]);
return;
}

floodfill4_state args[2];
args[0] = args[1] = *arg;
const enum coordinate_axes axis = ni[x_i] > ni[y_i] ?
x_i : // split vertically
y_i ; // split horizontally
int todo[2][ni[X2Y(axis)]+2]; // contacts between halves
const int hpos = (arg->limits[axis][0] + arg->limits[axis][1])/2; //
split point args[0].todo[axis*2+1] = todo[0]; args[0].limits[axis][1]
= hpos; args[1].todo[axis*2+0] = todo[1]; args[1].limits[axis][0] =
hpos + 1; int todo_i = arg->pos[axis] > hpos;
todo[todo_i][0] = 0; // empty todo list
floodfill4_core(prm, &args[todo_i]);
if (todo[todo_i][0] != 0) {
// do ping-pong between halves
args[0].pos[axis] = hpos;
args[1].pos[axis] = hpos+1;
const ptrdiff_t lda = prm->ld[axis];
const ptrdiff_t ldb = prm->ld[X2Y(axis)];
Color* edge = &prm->image[lda*hpos];
do {
// look for contact points on destination edge
int* from = todo[todo_i];
Color *edge_dst = todo_i ? edge : edge + lda;
todo_i = 1 - todo_i;
todo[todo_i][0] = 0;
int np = *from++;
do {
int pos = *from++;
if (edge_dst[pos*ldb] == prm->old_color) { // contact found
args[todo_i].pos[X2Y(axis)] = pos;
floodfill4_core(prm, &args[todo_i]);
}
} while (--np);
} while (todo[todo_i][0] != 0);
}
}

static
_Bool floodfill4_expand(
const floodfill4_param* prm,
floodfill4_state* s,
enum coordinate_axes axis,
enum from_to ft_idx)
{ // try to expand
int* src_todo = s->todo[axis*2+ft_idx];
if (*src_todo == 0)
return 0;

int src_pos = s->limits[axis][ft_idx];
if (src_pos == prm->limits[axis][ft_idx]) {
*src_todo = 0;
return 0;
}

typedef struct {
int pos0, pos1;
} interval_t;

const ptrdiff_t lda = prm->ld[axis];
const ptrdiff_t ldb = prm->ld[X2Y(axis)];
Color* src_col = &prm->image[lda*src_pos];
Color* exp_col = &src_col[lda*IDX2INC(ft_idx)];
const int ort_limit0 = s->limits[X2Y(axis)][0];
const int ort_limit1 = s->limits[X2Y(axis)][1];
const Color c0 = exp_col[ldb*ort_limit0]; // preserve upper corner
const Color c1 = exp_col[ldb*ort_limit1]; // preserve lower corner
interval_t workbuf[(ort_limit1 - ort_limit0+2)/2];
interval_t* wr = workbuf;
s->pos[axis] = src_pos;
int n_todo = src_todo[0];
do {
// look for contact
int pos = src_todo[n_todo--]; // use src_todo as stack, popping
from the top Color* pt = &exp_col[ldb*pos];
if (*pt == prm->old_color) { // contact found
*pt = prm->new_color;
// extend backward
Color* p = pt - ldb;
int pos0;
for (pos0 = pos-1; pos0 >= ort_limit0 && *p == prm->old_color; p
-= ldb, --pos0) *p = prm->new_color;
pos0 += 1;
// extend forward
p = pt + ldb;
int pos1;
for (pos1 = pos+1; pos1 <= ort_limit1 && *p == prm->old_color; p
+= ldb, ++pos1) *p = prm->new_color;
pos1 -= 1;

// add interval to result list
wr->pos0 = pos0;
wr->pos1 = pos1;
++wr;

if (pos0 != pos1) {
// bounce - apply new found interval to original rectangle
src_todo[0] = n_todo;
pos = pos0;
p = &src_col[ldb*pos];
do {
if (*p == prm->old_color) { // contact found
s->pos[X2Y(axis)] = pos;
floodfill4_core(prm, s);
n_todo = src_todo[0];
++pos;
p += ldb;
}
p += ldb;
} while (++pos <= pos1);
}
}
} while (n_todo != 0);

if (wr == workbuf)
return 0; // rectangle not expanded

// rectangle expanded
// handle corners of expanded rectangle
int exp_pos = src_pos + IDX2INC(ft_idx);
s->limits[axis][ft_idx] = exp_pos;
if (exp_col[ldb*ort_limit0] != c0) // corner0 modified
floodfill4_add(s->todo[X2Y(axis)*2+0], exp_pos); // add to todo0
list if (exp_col[ldb*ort_limit1] != c1) // corner1
modified floodfill4_add(s->todo[X2Y(axis)*2+1], exp_pos); // add to
todo1 list

// turn intervals to list
interval_t* rd = workbuf;
int* dst_todo = &src_todo[1];
do {
int pos = rd->pos0;
int pos1 = rd->pos1;
do
*dst_todo++ = pos;
while (++pos <= pos1);
++rd;
} while (rd != wr);
src_todo[0] = dst_todo - src_todo - 1;

return 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Wed May 15 09:57:39 2024

Michael S <[email protected]> writes:

On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Finally found the time for speed measurements. [...]

I got these. Thank you.

The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.

The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.

What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.

I think trying to look at those wouldn't tell me anything
helpful. Too many unknowns. And still no way to test or
measure any changes to the various algorithms.

Frankly, I don't understand.
If you have troubles with testing on shared hardware then you can
always test on the hardware that you own and has full control.
Even if it is a little old, the trends tend to be the same. At
least I clearly see the same trends on my almost 12 y.o. home PC
and on relatively modern EPYC3.

I have put this problem aside. It's a lot of work even if I had
a way to make substantive progress, and at present I don't.
Maybe more sometime later but for now I think suspending is the
only workable choice available.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Wed Jun 5 17:59:07 2024

On Wed, 5 Jun 2024 17:45:45 +0300
Michael S <[email protected]> wrote:

On Fri, 3 May 2024 18:33:05 +0300
Michael S <[email protected]> wrote:

On Thu, 25 Apr 2024 17:56:06 +0300
Michael S <[email protected]> wrote:

A solution (sort of) is in line with the famous quite of David
Wheeler
- to turn todo lists from bit maps into arrays of
abscesses-or-ordinates of contact points.

The cost is a memory footprint - 4x bigger than the previous
version, 32 times bigger than above-mentioned "packed" variant of
the previous version. But in BigO sense it's the same.

In my tests it reduced the worst case time from O(max(A,B)**3) to O(A*B*log(max(A,B)). Which is non-ideal, but probably acceptable,
because the bad cases should be very rare in practice.

The real trouble is different - I don't know if my "worst case" is
really the worst.

The code below is for presentation of algorithm in both clear and
compact manner, with emphasis on symmetry between x and y
directions. It is not optimal in any sense and can be made
no-trivially faster both by algorithm enhancements an by
specialization of critical loops.

Following code improves on ideas from the previous post.
Unlike the previous one, it is purely iterative, with no recursion.
The algorithm is simpler and access storage in more compact manner,
i.e. all accessed memory area starts from beginning and grows
according to need. Previous attempt did not have this property.
It's still longer and less simple than I would like.

And here is something that I found by chance when developing the code
presented in the previous post.
Unlike for the previous one, I can not prove that memory requirements
of this algorithm are O(N). However, for all my tests cases it's not
just O(N), but consumes significantly less memory than the one above.
And it is simpler and shorter.

// HIS - todo stack of Horizontal Intervals
// with periodic Squeeze of empty intervals
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>

typedef unsigned char Color;

int floodfill4(
Color* image,
int width, int height,
int x, int y,
Color old_color, Color new_color)
{
if (width <= 0 || height <= 0)
return 0;

if (x < 0 || x >= width || y < 0 || y >= height)
return 0;

size_t w = width;
Color* row = &image[w*y];
if (row[x] != old_color)
return 0;

typedef struct {
int x0, x1, y;
int8_t from; // -1 => from y-1, +1 => from y+1
} interval_t;

enum {
INITIAL_STACK_SIZE = 128,
SQUEEZE_THR = 32,
};
interval_t* stack_base =
malloc(INITIAL_STACK_SIZE*sizeof(*stack_base));
if (!stack_base)
return -1;
interval_t* stack_end = &stack_base[INITIAL_STACK_SIZE];
interval_t* todo = stack_base;

// recolor initial horizontal interval
row[x] = new_color;
// look backward
int x00;
for (x00 = x-1; x00 >= 0 && row[x00]==old_color; --x00)
row[x00] = new_color;
x00 += 1;
// look forward
int x01;
for (x01 = x+1; x01 < width && row[x01]==old_color; ++x01)
row[x01] = new_color;
x01 -= 1;

// push neighbors of initial interval on todo stack
for (int from = -1; from <= 1; from += 2) {
unsigned next_y = y-from;
if (next_y < (unsigned)height) {
todo->x0 = x00;
todo->x1 = x01;
todo->y = next_y;
todo->from = from;
++todo;
}
}

interval_t* squeezed = stack_base;
unsigned periodic_i = 0;
while (todo != stack_base) {
--todo; // pop interval from todo stack
int xBeg = todo->x0;
int xEnd = todo->x1;
int y = todo->y;

if (todo < squeezed)
squeezed = todo;

// look for target points
Color* row = &image[y*w];
int x = xBeg;
do {
if (row[x] == old_color) { // target found
row[x] = new_color;
int x0 = x;
if (x == xBeg) {
// look backward
for (x0 = x-1; x0 >= 0 && row[x0]==old_color; --x0)
row[x0] = new_color;
x0 += 1;
}
// look forward
int x1;
for (x1 = x+1; x1 < width && row[x1]==old_color; ++x1)
row[x1] = new_color;
x1 -= 1;

int from = todo->from;
// remaining part of current interval
if (x1+2 <= xEnd) {
todo->x0 = x+2;
todo->x1 = xEnd;
todo->y = y;
todo->from = from;
++todo;
}
// forward continuation
unsigned next_y = y-from;
if (next_y < (unsigned)height) {
todo->x0 = x0;
todo->x1 = x1;
todo->y = next_y;
todo->from = from;
++todo;
}
// bounces
y = y+from;
if (xEnd+2 <= x1) { // bounce on the right side
todo->x0 = xEnd+2;
todo->x1 = x1;
todo->y = y;
todo->from = -from;
++todo;
}
if (x0 <= xBeg-2) { // bounce on the left side
todo->x0 = x0;
todo->x1 = xBeg-2;
todo->y = y;
todo->from = -from;
++todo;
}
break;
}
++x;
} while (x <= xEnd);

++periodic_i;
if ((periodic_i & 31)==0) { // maintenance
if (todo - squeezed >= SQUEEZE_THR) {
// squeeze empty intervals
interval_t* wr = squeezed;
while (squeezed != todo) {
Color* row = &image[squeezed->y*w];
for (int x = squeezed->x0; x <= squeezed->x1; ++x) {
if (row[x] == old_color) { // interval non-empty
*wr = *squeezed;
wr->x0 = x;
++wr;
break;
}
}
++squeezed;
}
todo = squeezed = wr;
}

if (stack_end-todo < 67) {
// Allocate more space
size_t todo_i = todo - stack_base;
size_t squeezed_i = squeezed - stack_base;
size_t sz = stack_end - stack_base;
sz += (sz/128)*64;
interval_t* tmp = realloc(
stack_base, sz*sizeof(*stack_base));
if (!tmp) {
free(stack_base);
return -1;
}
stack_base = tmp;
stack_end = &stack_base[sz];
todo = &stack_base[todo_i];
squeezed = &stack_base[squeezed_i];
}
}
}

free(stack_base);
return 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Wed Jun 5 17:45:45 2024

On Fri, 3 May 2024 18:33:05 +0300
Michael S <[email protected]> wrote:

On Thu, 25 Apr 2024 17:56:06 +0300
Michael S <[email protected]> wrote:

A solution (sort of) is in line with the famous quite of David Wheeler
- to turn todo lists from bit maps into arrays of
abscesses-or-ordinates of contact points.

The cost is a memory footprint - 4x bigger than the previous version,
32 times bigger than above-mentioned "packed" variant of the previous version. But in BigO sense it's the same.

In my tests it reduced the worst case time from O(max(A,B)**3) to O(A*B*log(max(A,B)). Which is non-ideal, but probably acceptable,
because the bad cases should be very rare in practice.

The real trouble is different - I don't know if my "worst case" is
really the worst.

The code below is for presentation of algorithm in both clear and
compact manner, with emphasis on symmetry between x and y directions.
It is not optimal in any sense and can be made no-trivially faster
both by algorithm enhancements an by specialization of critical loops.

Following code improves on ideas from the previous post.
Unlike the previous one, it is purely iterative, with no recursion.
The algorithm is simpler and access storage in more compact manner, i.e.
all accessed memory area starts from beginning and grows according to
need. Previous attempt did not have this property.
It's still longer and less simple than I would like.

// try+split algorithm with flat storage
// - horizontal intervals
// - two stacks: main stack for intervals,
// auxiliary stack of areas of interest (AoI)
// - both stacks implemented as arrays
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#define NDEBUG
#include <assert.h>
#include <stdio.h>

typedef unsigned char Color;

typedef struct {
size_t n_intervals;
size_t n_splits;
} stack_sizes_t;

static
stack_sizes_t floodfill4_calc_stack_size(int width, int height)
{
stack_sizes_t sz = { .n_intervals = 1, .n_splits = 1 };
for (;;) {
ptrdiff_t len;
if (width > height) { // split vertically
len = height;
width = (width + 1)/2;
} else { // split horizontally
len = width;
height = (height + 1)/2;
}
if (len <= 1)
break;
sz.n_intervals += len*2 + 4;
sz.n_splits += 1;
}
return sz;
}

int floodfill4(
Color* image,
int width, int height,
int x, int y,
Color old_color, Color new_color)
{
if (width <= 0 || height <= 0)
return 0;

if (x < 0 || x >= width || y < 0 || y >= height)
return 0;

size_t w = width;
Color* row = &image[w*y];
if (row[x] != old_color)
return 0;

enum coordinate_axes {
x_i = 0, y_i, // index of pos[] MS bit of index of limits[][]
};
#define X2Y(axis) ((axis) ^ 1)
enum beg_or_end {
beg_i = 0, end_i // LS bit of index of limits[],
// I use 0 and 1 more commonly
};
enum limits_idx { // index of limits[]
x0_i = x_i*2+beg_i,
x1_i = x_i*2+end_i,
y0_i = y_i*2+beg_i,
y1_i = y_i*2+end_i,
};

typedef struct {
int x0, x1, y;
int from; // 0 => from y-1, 1 => from y+1
} interval_t;
typedef struct {
interval_t* parent_todo;
int saved_limit_val;
uint8_t saved_limit_idx; // axis*2+beg_or_end
int frame_capacity_deficit;
} parent_info_t;

stack_sizes_t stacks_len = floodfill4_calc_stack_size(width, height);
const size_t parent_info_sz = stacks_len.n_splits *
sizeof(parent_info_t); const size_t todo_sz = stacks_len.n_intervals
* sizeof(interval_t); void* stacks = malloc(parent_info_sz + todo_sz);
if (!stacks)
return -1;

parent_info_t* parents_stack = stacks;
parent_info_t* parents_stack_end =
&parents_stack[stacks_len.n_splits]; interval_t* todo_stack =
(interval_t*)parents_stack_end; interval_t* todo = todo_stack;
#ifndef NDEBUG
interval_t* todo_stack_end = &todo[stacks_len.n_intervals];
#endif

int limits[2*2] = { 0, width-1, 0, height-1}; // {x0, x1, y0, y1};

// recolor initial horizontal interval
row[x] = new_color;
// look backward
int x00;
for (x00 = x-1; x00 >= 0 && row[x00]==old_color; --x00)
row[x00] = new_color;
x00 += 1;
// look forward
int x01;
for (x01 = x+1; x01 < width && row[x01]==old_color; ++x01)
row[x01] = new_color;
x01 -= 1;
// push neighbors of initial interval on todo stack
for (enum beg_or_end from = beg_i; from <= end_i; ++from) {
unsigned next_y = y+1-from*2;
if (next_y < (unsigned)height) {
todo->x0 = x00;
todo->x1 = x01;
todo->y = next_y;
todo->from = from;
++todo;
}
}

parent_info_t* parent_aoi = parents_stack;
interval_t* parent_todo = todo_stack;
ptrdiff_t frame_capacity = width < height ? width : height;
for (;;) {
while (todo != parent_todo) {
assert(todo_stack_end != todo);
assert(parent_todo >= todo_stack && parent_todo <=
todo_stack_end); // Get interval from top of todo stack
--todo; // pop interval from todo stack
int xBeg = todo->x0;
int xEnd = todo->x1;
int y = todo->y;
int from = todo->from;
// check range
if ((unsigned)(y-limits[y0_i]) >
(unsigned)(limits[y1_i]-limits[y0_i]) || xEnd < limits[x0_i] || xBeg
> limits[x1_i]) { // Whole interval belongs to parent
// Bring value from the bottom of todo stack to the top
// freeing stack slot for parent stack
assert(todo_stack_end != todo);
*todo = *parent_todo; ++todo;
// Store interval on top of parent stack
parent_todo->x0 = xBeg;
parent_todo->x1 = xEnd;
parent_todo->y = y;
parent_todo->from = from;
++parent_todo;
continue;
}
// At least a part of the interval is in current rectangle
if (xBeg < limits[x0_i]) {
// left part of interval belongs to parent
// Store left part of interval on todo stack
// for later demotion to parent's stack
assert(todo_stack_end != todo);
todo->x0 = xBeg;
todo->x1 = limits[x0_i]-1;
todo->y = y;
todo->from = from;
++todo;
xBeg = limits[x0_i]; // adjust xBeg
}
if (xEnd > limits[x1_i] ) {
// right part of interval belongs to parent
// Store right part of interval on todo stack
// for later demotion to parent's stack
assert(todo_stack_end != todo);
todo->x0 = limits[x1_i]+1;
todo->x1 = xEnd;
todo->y = y;
todo->from = from;
++todo;
xEnd = limits[x1_i]; // adjust xEnd
}
// remaining part of interval is within limits

// look for target points
Color* row = &image[y*w];
int x = xBeg;
do {
if (row[x] == old_color) { // target found
if (todo-parent_todo > frame_capacity) {
// can't complete floodfill of current rectangle
// due to space constraints.
// Split
const int dLim[] = {
limits[x1_i]-limits[x0_i],
limits[y1_i]-limits[y0_i]};
const enum coordinate_axes axis =
dLim[x_i] > dLim[y_i] ?
x_i : // split vertically
y_i ; // split horizontally
// select half
const int hpos0 = (limits[axis*2+0] +
limits[axis*2+1])/2; // lower split point
const int pos[2] = { [x_i] = x, [y_i] = y};
enum beg_or_end src_i = pos[axis] > hpos0;

// preserve state of current rectangle on parents stack
assert(parent_aoi != parents_stack_end);
parent_aoi->parent_todo = parent_todo;
enum beg_or_end save_i = 1 - src_i;
enum limits_idx saved_limit_idx = axis*2+save_i;
parent_aoi->saved_limit_idx = saved_limit_idx;
parent_aoi->saved_limit_val = limits[saved_limit_idx];
parent_aoi->frame_capacity_deficit =
todo-parent_todo - frame_capacity;
++parent_aoi;

// switch processing to selected half of rectangle
frame_capacity = dLim[X2Y(axis)] + 1;
limits[saved_limit_idx] = hpos0+src_i;
parent_todo = todo;
// push interval on fresh todo stack
assert(todo_stack_end != todo);
todo->x0 = x;
todo->x1 = xEnd;
todo->y = y;
todo->from = from;
++todo;
break;
}

row[x] = new_color;
// look forward
int x1;
for (x1 = x+1; x1 < width && row[x1]==old_color; ++x1)
row[x1] = new_color;
x1 -= 1;

int x0 = x;
if (x == xBeg) {
// look backward
for (x0 = x-1; x0 >= 0 && row[x0]==old_color; --x0)
row[x0] = new_color;
x0 += 1;

// bounce
if (x0 <= xBeg-2) {
assert(todo_stack_end != todo);
todo->x0 = x0;
todo->x1 = xBeg-2;
todo->y = y+from*2-1;
todo->from = 1-from;
++todo;
}
}
// bounce
if (x1 >= xEnd+2) {
assert(todo_stack_end != todo);
todo->x1 = x1;
todo->x0 = xEnd+2;
todo->y = y+from*2-1;
todo->from = 1-from;
++todo;
}
unsigned next_y = y+1-from*2;
if (next_y < (unsigned)height) {
// continuation
#if 1
// The following if is not necessary for correction
// It is here to speed up few test cases
if (y != limits[y0_i+1-from] &&
x0 >= limits[x0_i] &&
x1 <= limits[x1_i] &&
x1+2 > xEnd &&
todo-parent_todo <= frame_capacity)
{ // Bypass stack
// Advance vertically in the same direction
xBeg = x = x0;
xEnd = x1;
y = next_y;
row = &image[y*w];
continue;
}
#endif
// put new interval on current stack
assert(todo_stack_end != todo);
todo->x0 = x0;
todo->x1 = x1;
todo->y = next_y;
todo->from = from;
++todo;
}
x = x1+1;
}
++x;
} while (x <= xEnd);
}

if (parent_aoi == parents_stack)
break; // top AOI finished

// back to parent rectangle
--parent_aoi;
parent_todo = parent_aoi->parent_todo;
limits[parent_aoi->saved_limit_idx]= parent_aoi->saved_limit_val;
frame_capacity = todo - parent_todo
- parent_aoi->frame_capacity_deficit;
}
assert((void*)todo == (void*)parents_stack_end);

free(stacks);
return 1;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Thu Jul 30 02:32:09 2026
  from Madison, Nc via Telnet
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet
- Zenobyte
  Wed Jul 29 21:08:05 2026
  from San Juan, Pr via Telnet
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	82:35:42
Calls:	12,451
Calls today:	1
Files:	15,194
Messages:	6,537,765

filling area by color atack safety

Who's Online

Recent Visitors

System Info