Vir Campestris <
[email protected]d> writes:
I retired last year, and I haven't really written any code since. This
has turned out to be quite a fun little thing of a type I haven't had
time for for YEARS. And oddly I still don't seem to have enough time
for it... It's the garden, and the kid's gardens, and my mum's garden,
and all those holidays :)
But some optimisations. You'll remember in Bonita's first version the
bitmap was initialised to 0xaaaa, because it's a waste of time doing
the sieve for 2.
I pointed out that we don't need to even store the even numbers.
But there's more.
If you look at the bitmap when you've sieved for 2 you see
12 34 56 78
11 10 10 10...
which is a repeat of 2 after an initialisation word. That's the aaaa.
You can do the same with 3
123 456 789
111 110 110 110
except this time the repeat is 3. And annoyingly that doesn't map well
down onto a byte-based architecture. You end up with an initial word,
then a 3-word repeat. (If your word was 24 or 36 bytes it would only
be 1 word, but I haven't seen that since the 1970s)
In hex, with lowest byte first, that is
fd b6 6d db b6 6d db
(That BTW is the same if you only store the odd numbers - a 3-word repeat.)
So rather than start off with your bitmap all set to 1s you can set it
to this repeating pattern. That replaces all the ANDs for all the
values of three with a memory fill with far fewer accesses.
You can do the same with 5:
12345 6789a bcdef
11111 11110 11110 11110
You can then AND your pattern for 3 with the one for 5, and get one
with a repeat length of 15, and set that into your bitmap. You've now replaced about a fifth of all your AND operations with a flood fill.
This can carry on - for a while. Only a short while. You _can_ make a sequence for lots of primes. But it gets quite long, quite
quickly. For up to 23, and not storing the evens, it's over 1E8 words
long!
I was implementing a version of that when something else occurred to
me. You can sacrifice speed for store size if you're prepared to do an integer divide for every prime lookup.
I explained before how numbers can be considered mod 30, of which
only the residues { 1, 7, 11, 13, 17, 19, 23, 29 } are candidates,
because all other residues are divisible by 2, 3, or 5. And very
conveniently, there are exactly 8 of these residues, so one byte
can be used to represent each block of 30 numbers (only 8 of which
might be prime). That cuts down on the space needed.
I also explained how the divides and remainders can be avoided,
by taking advantage of the structure of how the numbers being
considered are represented, and arranging that the difficult parts
be done at compile time.
I have an implementation (written in C) based on this approach that
determines all primes less than roughly 51.75 billion, taking just
under 56 seconds to complete. (No threading is used - code is all
single threaded.)
[...]
But a note for the group of course - optimising this to the max has
nothing to do whatever with C++. The only C++ optimising I've found
myself doing is using raw pointers, not vector's
operator[]. (certainly not the at function). And also I found myself
using emplace_back a lot. It's a PITA because you can only emplace
back a single item, and it is slow.
My program is written in C and uses ordinary C arrays. I expect it
could be converted to C++ without too much difficulty.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)