D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
The constexpr is really very smart because it can speed up algorithms
1000 times according to Dave, Microsoft retired engineer. He has
proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like in
his example. It was almost instant at the blink of the eyes.
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
Amazing.
On Sun, 15 Dec 2024 20:20:42 +0000
Student Project <[email protected]d> wrote:
The constexpr is really very smart because it can speed up algorithms
1000 times according to Dave, Microsoft retired engineer. He has
proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like in
his example. It was almost instant at the blink of the eyes.
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
Amazing.
I didn't see the video (I never see that type of videos), but 270 microseconds sound astonishingly slow for fib(35).
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
On 16/12/2024 10:28, Michael S wrote:
On Sun, 15 Dec 2024 20:20:42 +0000
Student Project <[email protected]d> wrote:
The constexpr is really very smart because it can speed up
algorithms 1000 times according to Dave, Microsoft retired
engineer. He has proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like
in his example. It was almost instant at the blink of the eyes.
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
Amazing.
I didn't see the video (I never see that type of videos), but 270 microseconds sound astonishingly slow for fib(35).
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
There are a dozen different ways to calculate Fibonacci numbers,
ranging from very fast to very slow - but demonstrating different
things. Even your program is going to be very slow compared to just
using φⁿ / √5.
My guess is that the OP is referring to some kind of naïve recursive Fibonacci implementation.
If the function is declared "constexpr"
and the values used are compile-time constants, then maybe the use of "constexpr" leads to more of it being pre-calculated at compile time.
The OP is right that "constexpr" can be very "smart" - but I'm not
sure this is the best calculation for demonstrating that. However, I
have not watched the video either, and maybe it's well presented.
On Mon, 16 Dec 2024 11:18:46 +0100
David Brown <[email protected]> wrote:
On 16/12/2024 10:28, Michael S wrote:
On Sun, 15 Dec 2024 20:20:42 +0000
Student Project <[email protected]d> wrote:
The constexpr is really very smart because it can speed up
algorithms 1000 times according to Dave, Microsoft retired
engineer. He has proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like
in his example. It was almost instant at the blink of the eyes.
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
Amazing.
I didn't see the video (I never see that type of videos), but 270
microseconds sound astonishingly slow for fib(35).
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
There are a dozen different ways to calculate Fibonacci numbers,
ranging from very fast to very slow - but demonstrating different
things. Even your program is going to be very slow compared to just
using φⁿ / √5.
My news reader is not sure about equation above.
You probably meant ceil(pow((1+sqrt(5))/2, n)/sqrt(5))
It works well with double precision math up to n=75. After that it
would need high precision fp library, so probably [much] slower than
simple loop above at least up to n=92 where we run out of range of int64 result.
My guess is that the OP is referring to some kind of naïve recursive
Fibonacci implementation.
Something like that ?
static long long slow_fib(long n)
{
if (n < 1)
return 0;
if (n == 1)
return 1;
return slow_fib(n-2)+slow_fib(n-1);
}
Yes, 270 usec could be considered fast relatively to that. slow_fib(35)
takes 20 msec on my old PC.
If the function is declared "constexpr"
and the values used are compile-time constants, then maybe the use of
"constexpr" leads to more of it being pre-calculated at compile time.
But then, how long would it take to compile?
I'd guess for n=35 it's not too bad, but what about n=92?
Slow algorithm is a slow algorithm. Doing it in compile time can only exaggerate the slowness.
The OP is right that "constexpr" can be very "smart" - but I'm not
sure this is the best calculation for demonstrating that. However, I
have not watched the video either, and maybe it's well presented.
If it is video and it is about programming then it can not be good.
On 16/12/2024 12:23, Michael S wrote:
On Mon, 16 Dec 2024 11:18:46 +0100
David Brown <[email protected]> wrote:
On 16/12/2024 10:28, Michael S wrote:
My guess is that the OP is referring to some kind of naïve
recursive Fibonacci implementation.
Something like that ?
static long long slow_fib(long n)
{
if (n < 1)
return 0;
if (n == 1)
return 1;
return slow_fib(n-2)+slow_fib(n-1);
}
Yes, 270 usec could be considered fast relatively to that.
slow_fib(35) takes 20 msec on my old PC.
That's what I was guessing, yes. Calculating Fibonacci numbers is
often used as an example for showing how to write recursive functions
with a structure like the one above, then looking at different
techniques for improving them - such as using a recursive function
that takes (n, a, b) and returns (n - 1, b, a + b), or perhaps
memoization. (You could probably teach a term's worth of a Haskell
course based entirely on formulations for calculating Fibonacci
functions!)
It can also be an interesting exercise to look at the speed of
calculating Fibonacci numbers for different sizes. I expect your
linear integer function will be fastest for small n, then powers of
phi will be faster for a bit, then you'd switch back to integers of progressively larger but fixed sizes using the recursive formula for
double n :
fib(2 * n - 1) = fib(n) ** 2 + fib(n - 1) ** 2
fib(2 * n) = (2 * fib(n - 1) + fib(n)) * fib(n)
As you get even bigger, you would then want to use fancier ways to do
the multiplication - I don't know if those can be used in a way that interacts well with formulas for the Fibonacci numbers.
I guess that can be left as an exercise for the OP !
On Mon, 16 Dec 2024 13:43:11 +0100
David Brown <[email protected]> wrote:
On 16/12/2024 12:23, Michael S wrote:
On Mon, 16 Dec 2024 11:18:46 +0100
David Brown <[email protected]> wrote:
On 16/12/2024 10:28, Michael S wrote:
My guess is that the OP is referring to some kind of naïve
recursive Fibonacci implementation.
Something like that ?
static long long slow_fib(long n)
{
if (n < 1)
return 0;
if (n == 1)
return 1;
return slow_fib(n-2)+slow_fib(n-1);
}
Yes, 270 usec could be considered fast relatively to that.
slow_fib(35) takes 20 msec on my old PC.
That's what I was guessing, yes. Calculating Fibonacci numbers is
often used as an example for showing how to write recursive functions
with a structure like the one above, then looking at different
techniques for improving them - such as using a recursive function
that takes (n, a, b) and returns (n - 1, b, a + b), or perhaps
memoization. (You could probably teach a term's worth of a Haskell
course based entirely on formulations for calculating Fibonacci
functions!)
It can also be an interesting exercise to look at the speed of
calculating Fibonacci numbers for different sizes. I expect your
linear integer function will be fastest for small n, then powers of
phi will be faster for a bit, then you'd switch back to integers of
progressively larger but fixed sizes using the recursive formula for
double n :
fib(2 * n - 1) = fib(n) ** 2 + fib(n - 1) ** 2
fib(2 * n) = (2 * fib(n - 1) + fib(n)) * fib(n)
As you get even bigger, you would then want to use fancier ways to do
the multiplication - I don't know if those can be used in a way that
interacts well with formulas for the Fibonacci numbers.
There exist precise methods of integer multiplication with complexity
of O(N*Log(N)) where N is number of digits. Personally, I never had a
need for them, but I believe that they work.
I guess that can be left as an exercise for the OP !
Since OP's ideas of "fast" are very humble, he is likely to be
satisfied with the very first step above naive:
static long long less_slow_fib(long n)
{
if (n < 3)
return n < 1 ? 0 : 1;
return less_slow_fib(n-2)*2+less_slow_fib(n-3);
}
On my old PC is calculates fib(35) in 24 usec. That is more than
10 times faster than his idea of fast.
On Sun, 15 Dec 2024 20:20:42 +0000
Student Project <[email protected]d> wrote:
The constexpr is really very smart because it can speed up
algorithms 1000 times according to Dave, Microsoft retired
engineer. He has proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like
in his example. It was almost instant at the blink of the eyes.
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
Amazing.
I didn't see the video (I never see that type of videos), but 270 microseconds sound astonishingly slow for fib(35).
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
Michael S <[email protected]> writes:
On Sun, 15 Dec 2024 20:20:42 +0000
Student Project <[email protected]d> wrote:
The constexpr is really very smart because it can speed up
algorithms 1000 times according to Dave, Microsoft retired
engineer. He has proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like
in his example. It was almost instant at the blink of the eyes.
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 270
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 257
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 171
D:\CmdLine\C_Cpp\Chrono02>program
Fibonacci_c: 9227465
Time Taken: 176
Amazing.
I didn't see the video (I never see that type of videos), but 270 microseconds sound astonishingly slow for fib(35).
Slow for the problem, but not slow for the algorithm. The
point of the video was to compare relative speeds of an
algorithm under two different compilation schemes (with and
without constexpr), not to compare absolute speeds to solve
the problem of computing fibonacci numbers.
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
Here is my second fastest fibonacci calculation code (for
relatively small inputs):
typedef long unsigned long ULL;
ULL
fibonacci( unsigned n ){
ULL b = n&1, a = b^1;
if( n & 2 ) a += b, b += a;
if( n & 4 ) a += b, b += a, a += b, b += a;
if( n & 8 ){
ULL na = 13*a+21*b, nb = 21*a+34*b;
a = na, b = nb;
}
n >>= 4;
while( n-- ){
ULL na = 610*a + 987*b, nb = 987*a + 1597*b;
a = na, b = nb;
}
return b;
}
My fastest fibonacci code uses recursion. :)
Student Project <[email protected]d> writes:
The constexpr is really very smart because it can speed up algorithms
1000 times according to Dave, Microsoft retired engineer. He has proved it >> by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like in his
example. It was almost instant at the blink of the eyes.
Can you post the relevant code?
[...]
I am pretty sure that better variant exists, but for tonight it's
enough.
On Mon, 16 Dec 2024 11:18:46 +0100
David Brown <[email protected]> wrote:
On 16/12/2024 10:28, Michael S wrote:
On Sun, 15 Dec 2024 20:20:42 +0000
Student Project <[email protected]d> wrote:
The constexpr is really very smart because it can speed up
algorithms 1000 times according to Dave, Microsoft retired
engineer. He has proved it by creating this video:
<https://youtu.be/8-VZoXn8f9U?si=iy1UimoWcaLG31Xi>
On my computer it took 270 microseconds to calculate fib(35) like
in his example. It was almost instant at the blink of the eyes.
[...]
Amazing.
I didn't see the video (I never see that type of videos), but 270
microseconds sound astonishingly slow for fib(35).
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
There are a dozen different ways to calculate Fibonacci numbers,
ranging from very fast to very slow - but demonstrating different
things. Even your program is going to be very slow compared to
just using
[garbled; presumably ceil(pow((1+sqrt(5))/2, n)/sqrt(5))]
My news reader is not sure about equation above.
You probably meant ceil(pow((1+sqrt(5))/2, n)/sqrt(5))
It works well with double precision math up to nu. After that it
would need high precision fp library, so probably [much] slower
than simple loop above at least up to n' where we run out of range
of int64 result.
My guess is that the OP is referring to some kind of naive
recursive Fibonacci implementation.
Something like that ?
static long long slow_fib(long n)
{
if (n < 1)
return 0;
if (n == 1)
return 1;
return slow_fib(n-2)+slow_fib(n-1);
}
Yes, 270 usec could be considered fast relatively to that.
slow_fib(35) takes 20 msec on my old PC.
If the function is declared "constexpr" and the values used are
compile-time constants, then maybe the use of "constexpr" leads
to more of it being pre-calculated at compile time.
But then, how long would it take to compile?
I'd guess for n5 it's not too bad, but what about n'?
Slow algorithm is a slow algorithm. Doing it in compile time can
only exaggerate the slowness.
The OP is right that "constexpr" can be very "smart" - but I'm
not sure this is the best calculation for demonstrating that.
However, I have not watched the video either, and maybe it's well
presented.
If it is video and it is about programming then it can not be good.
Michael S <[email protected]> writes:
My news reader is not sure about equation above.
You probably meant ceil(pow((1+sqrt(5))/2, n)/sqrt(5))
It works well with double precision math up to nu. After that it
would need high precision fp library, so probably [much] slower
than simple loop above at least up to n' where we run out of range
of int64 result.
<+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>
#include <iostream>
#include <chrono>
int fibonacci(int n)
{
if (n <= 1)
return n;
return fibonacci(n - 2) + fibonacci(n - 1);
}
constexpr int fibonacci_c(int n)
{
if (n <= 1)
return n;
return fibonacci_c(n - 2) + fibonacci_c(n - 1);
}
int main(void)
{
// using namespace std::literals::chrono_literals;
auto start = std::chrono::high_resolution_clock::now();
constexpr int num = 35;
/*constexpr*/ int result_c = fibonacci_c(num);
std::cout << "Fibonacci_c " << result_c << "\n";
std::cout << "Time taken: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now()
- start).count() << "\n";
start = std::chrono::high_resolution_clock::now();
int result = fibonacci(num);
std::cout << "Fibonacci " << result << "\n";
std::cout << "Time taken: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now()
- start).count() << "\n";
}
<+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On Wed, 18 Dec 2024 03:26:15 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
My news reader is not sure about equation above.
You probably meant ceil(pow((1+sqrt(5))/2, n)/sqrt(5))
It works well with double precision math up to nu. After that it
would need high precision fp library, so probably [much] slower
than simple loop above at least up to n' where we run out of range
of int64 result.
It seems like your newsreader don't like combinations of characters that
I considered benign. Like n=75 and n=92
In case you want to see what was an original, it is here:
[link]
On Tue, 17 Dec 2024 12:54:19 -0800[...]
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
Here is my second fastest fibonacci calculation code (for
relatively small inputs):
typedef long unsigned long ULL;
ULL
fibonacci( unsigned n ){
ULL b = n&1, a = b^1;
if( n & 2 ) a += b, b += a;
if( n & 4 ) a += b, b += a, a += b, b += a;
if( n & 8 ){
ULL na = 13*a+21*b, nb = 21*a+34*b;
a = na, b = nb;
}
n >>= 4;
while( n-- ){
ULL na = 610*a + 987*b, nb = 987*a + 1597*b;
a = na, b = nb;
}
return b;
}
From BigO perspective this code looks like it's still O(n) so no
better than simple loop.
According to my measurement gear, in range 0 to 92 there are few
points where it is faster than simple loop, but in majority of
cases it is slower.
Michael S <[email protected]> writes:
On Tue, 17 Dec 2024 12:54:19 -0800
Tim Rentsch <[email protected]> wrote:
[...]Michael S <[email protected]> writes:
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
Here is my second fastest fibonacci calculation code (for
relatively small inputs):
typedef long unsigned long ULL;
ULL
fibonacci( unsigned n ){
ULL b = n&1, a = b^1;
if( n & 2 ) a += b, b += a;
if( n & 4 ) a += b, b += a, a += b, b += a;
if( n & 8 ){
ULL na = 13*a+21*b, nb = 21*a+34*b;
a = na, b = nb;
}
n >>= 4;
while( n-- ){
ULL na = 610*a + 987*b, nb = 987*a + 1597*b;
a = na, b = nb;
}
return b;
}
From BigO perspective this code looks like it's still O(n) so no
better than simple loop.
It is O(n) but it has a smaller constant.
According to my measurement gear, in range 0 to 92 there are few
points where it is faster than simple loop, but in majority of
cases it is slower.
I'm at a loss to understand how this could happen. In my own
measurements, the code shown above runs faster than a simple loop in
all cases above n > 12, more than twice as fast when n > 17, more
than three times as fast when n > 42, and going up from there. What
might account for these radically different results?
On Wed, 18 Dec 2024 01:33:42 +0200
Michael S <[email protected]> wrote:
I am pretty sure that better variant exists, but for tonight it's
enough.
Morning fibs.
Recursive:
struct pair_t {
long long a,b;
};
// return .a=fib(n), .b=fib(n+1)
static struct pair_t fib2(long n)
{
if (n <= 0) {
struct pair_t ret = { .a = 0, .b = n==0 ? 1 : 0 };
return ret;
}
// for n > 0
// fib(n*2) = (fib(n-1)+fib(n+1))*fib(n)
// fib(n*2+1) = fib(n)*fib(n)+fib(n+1)*fib(n+1)
long m = (n-1) >> 1;
struct pair_t ret = fib2(m);
long long a = ret.a, b = ret.b;
long long c = a + b;
if ((n & 1)==0) { // (m,m+1) => ((m+1)*2,(m+1)*2*2+1)
ret.a = (a + c)*b;
ret.b = b*b + c*c;
} else { // (m,m+1) => (m*2+1,(m+1)*2)
ret.a = a*a + b*b;
ret.b = (a + c)*b;
}
return ret;
}
static long long fib(long n)
{
struct pair_t x = fib2(n-1);
return x.b;
}
Iterative:
static long long fib(long n)
{
if (n <= 0)
return 0;
// find MS bit
unsigned long tmp = n;
unsigned long bit = 1;
while (tmp > 1) {
bit += bit;
tmp /= 2;
}
// for n > 0
// fib(n*2) = (fib(n-1)+fib(n+1))*fib(n)
// fib(n*2+1) = fib(n)*fib(n)+fib(n+1)*fib(n+1)
long long a = 0, b = 1;
while (bit > 1) {
long long c = a + b; // fib(n+1)
bit /= 2;
if ((n & bit)==0) { // (n-1,n) => (n*2-1,n*2)
c += a;
a = a*a + b*b;
b = c*b;
} else { // (n-1,n) => (n*2,n*2+1)
a = (a + c)*b;
b = b*b + c*c;
}
}
return b;
}
Both variants above are O(log(n)).
In practice for n in range 0 to 92 my measurement gear does not detect differences in speed vs simple loop. If anything, simple loop is a
little faster.
On Wed, 18 Dec 2024 09:45:38 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Tue, 17 Dec 2024 12:54:19 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
[...]
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
Here is my second fastest fibonacci calculation code (for
relatively small inputs):
typedef long unsigned long ULL;
ULL
fibonacci( unsigned n ){
ULL b = n&1, a = b^1;
if( n & 2 ) a += b, b += a;
if( n & 4 ) a += b, b += a, a += b, b += a;
if( n & 8 ){
ULL na = 13*a+21*b, nb = 21*a+34*b;
a = na, b = nb;
}
n >>= 4;
while( n-- ){
ULL na = 610*a + 987*b, nb = 987*a + 1597*b;
a = na, b = nb;
}
return b;
}
From BigO perspective this code looks like it's still O(n) so no
better than simple loop.
It is O(n) but it has a smaller constant.
According to my measurement gear, in range 0 to 92 there are few
points where it is faster than simple loop, but in majority of
cases it is slower.
I'm at a loss to understand how this could happen. In my own
measurements, the code shown above runs faster than a simple loop in
all cases above n > 12, more than twice as fast when n > 17, more
than three times as fast when n > 42, and going up from there. What
might account for these radically different results?
May be, number of repetitions?
I run the test only once. That gives relative advantage to smaller
code which is less sensitive to cold ICache and to cold branch
predictors.
Also, because I am running it only once, my test is not so good in
seeing differences between various "good" algorithms.
But I feel that running test once is more realistic.
Iterative:
static long long fib(long n)
{
if (n <= 0)
return 0;
// find MS bit
unsigned long tmp = n;
unsigned long bit = 1;
while (tmp > 1) {
bit += bit;
tmp /= 2;
}
// for n > 0
// fib(n*2) = (fib(n-1)+fib(n+1))*fib(n)
// fib(n*2+1) = fib(n)*fib(n)+fib(n+1)*fib(n+1)
long long a = 0, b = 1;
while (bit > 1) {
long long c = a + b; // fib(n+1)
bit /= 2;
if ((n & bit)==0) { // (n-1,n) => (n*2-1,n*2)
c += a;
a = a*a + b*b;
b = c*b;
} else { // (n-1,n) => (n*2,n*2+1)
a = (a + c)*b;
b = b*b + c*c;
}
}
return b;
}
D:\CmdLine\C_Cpp\Chrono06>program
Testing fibonacci function
fib(92) = 7540113804746346429
Time taken by fib: 0 microseconds
D:\CmdLine\C_Cpp\Chrono06>program
Testing fibonacci function
fib(92) = 7540113804746346429
Time taken by fib: 0 microseconds
D:\CmdLine\C_Cpp\Chrono06>program
Testing fibonacci function
fib(92) = 7540113804746346429
Time taken by fib: 0 microseconds
D:\CmdLine\C_Cpp\Chrono06>program
Testing fibonacci function
fib(92) = 7540113804746346429
Time taken by fib: 0 microseconds
Michael S <[email protected]> writes:
On Wed, 18 Dec 2024 09:45:38 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Tue, 17 Dec 2024 12:54:19 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
[...]
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
static long long fib(long n)
{
if (fib <= 0)
return 0;
long long f0 = 0, f1 = 1;
for (long i = 1; i < n; ++i) {
long long f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f1;
}
Here is my second fastest fibonacci calculation code (for
relatively small inputs):
typedef long unsigned long ULL;
ULL
fibonacci( unsigned n ){
ULL b = n&1, a = b^1;
if( n & 2 ) a += b, b += a;
if( n & 4 ) a += b, b += a, a += b, b += a;
if( n & 8 ){
ULL na = 13*a+21*b, nb = 21*a+34*b;
a = na, b = nb;
}
n >>= 4;
while( n-- ){
ULL na = 610*a + 987*b, nb = 987*a + 1597*b;
a = na, b = nb;
}
return b;
}
From BigO perspective this code looks like it's still O(n) so no
better than simple loop.
It is O(n) but it has a smaller constant.
According to my measurement gear, in range 0 to 92 there are few
points where it is faster than simple loop, but in majority of
cases it is slower.
I'm at a loss to understand how this could happen. In my own
measurements, the code shown above runs faster than a simple loop
in all cases above n > 12, more than twice as fast when n > 17,
more than three times as fast when n > 42, and going up from
there. What might account for these radically different results?
May be, number of repetitions?
I run the test only once. That gives relative advantage to smaller
code which is less sensitive to cold ICache and to cold branch
predictors.
That's an interesting idea. Can you also run a measurement where
the code is run inside loops? I think it would be instructive
to compare the results under the two approaches.
(The server I use seems not to have the necessary support to
measure very fast code segments that are run only once.)
Also, because I am running it only once, my test is not so good in
seeing differences between various "good" algorithms.
But I feel that running test once is more realistic.
I think an argument could be made that repeatedly running the code
in a loop gives a more relevant result. Any short code segment that
is run only a handful of times will never have a significant impact
on overall program performance. It's better to optimize under the
assumption that the code will be "hot": if it is hot then the
optimization will be worthwhile, and if it isn't hot then the loss
will be way down in the noise and not worth worrying about.
Michael S <[email protected]> writes:
On Wed, 18 Dec 2024 01:33:42 +0200
Michael S <[email protected]> wrote:
I am pretty sure that better variant exists, but for tonight it's
enough.
Morning fibs.
Recursive:
struct pair_t {
long long a,b;
};
// return .a=fib(n), .b=fib(n+1)
static struct pair_t fib2(long n)
{
if (n <= 0) {
struct pair_t ret = { .a = 0, .b = n==0 ? 1 : 0 };
return ret;
}
// for n > 0
// fib(n*2) = (fib(n-1)+fib(n+1))*fib(n)
// fib(n*2+1) = fib(n)*fib(n)+fib(n+1)*fib(n+1)
long m = (n-1) >> 1;
struct pair_t ret = fib2(m);
long long a = ret.a, b = ret.b;
long long c = a + b;
if ((n & 1)==0) { // (m,m+1) => ((m+1)*2,(m+1)*2*2+1)
ret.a = (a + c)*b;
ret.b = b*b + c*c;
} else { // (m,m+1) => (m*2+1,(m+1)*2)
ret.a = a*a + b*b;
ret.b = (a + c)*b;
}
return ret;
}
static long long fib(long n)
{
struct pair_t x = fib2(n-1);
return x.b;
}
I may have inadvertently misled you. Here is a simple linear
formulation that uses recursion:
typedef unsigned long long NN;
static NN fibonacci_3( NN, NN, unsigned );
NN
fibonacci( unsigned n ){
return fibonacci_3( 1, 0, n );
}
NN
fibonacci_3( NN a, NN b, unsigned n ){
return n == 0 ? b : fibonacci_3( b, a+b, n-1 );
}
Can you apply this idea to your logarithmic scheme?
On Thu, 19 Dec 2024 00:57:20 -0800[...]
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Wed, 18 Dec 2024 09:45:38 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
[...]According to my measurement gear, in range 0 to 92 there are few
points where [the code I posted] is faster than simple loop, but
in majority of cases it is slower.
I'm at a loss to understand how this could happen. In my own
measurements, the code shown above runs faster than a simple loop
in all cases above n > 12, more than twice as fast when n > 17,
more than three times as fast when n > 42, and going up from
there. What might account for these radically different results?
May be, number of repetitions?
I run the test only once. That gives relative advantage to smaller
code which is less sensitive to cold ICache and to cold branch
predictors.
That's an interesting idea. Can you also run a measurement where
the code is run inside loops? I think it would be instructive
to compare the results under the two approaches.
I feel that running fib(n) with the same n in loop too unrealistic.
So I decided to run, at least, with different values of n.
Ended up spending about a hour just to build a test bench.
The answer is - in a loop of more than dozen iterations your code is
indeed faster. Esp. so for hundred or more iterations.
Here is my test bench. [...]
On Thu, 19 Dec 2024 00:40:32 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Wed, 18 Dec 2024 01:33:42 +0200
Michael S <[email protected]> wrote:
[a recursive logarithmic formulation]
I may have inadvertently misled you. Here is a simple linear
formulation that uses recursion:
typedef unsigned long long NN;
static NN fibonacci_3( NN, NN, unsigned );
NN
fibonacci( unsigned n ){
return fibonacci_3( 1, 0, n );
}
NN
fibonacci_3( NN a, NN b, unsigned n ){
return n == 0 ? b : fibonacci_3( b, a+b, n-1 );
}
Can you apply this idea to your logarithmic scheme?
Not really. But I can cheat.
Below is slightly modified iterative algorithm coded as recursion.
static
long long fib_3(long long a, long long b, unsigned long rev_n) {
if (rev_n == 1)
return b;
long long c = a + b;
if ((rev_n & 1) == 0)
return fib_3(a*a+b*b, (a+c)*b, rev_n/2);
else
return fib_3((a+c)*b, b*b+c*c, rev_n/2);
}
static
long long fib_bitrev(unsigned long a, unsigned long b) {
return a==1 ? fib_3(0, 1, b) : fib_bitrev( a / 2, b * 2 + a % 2);
}
long long fib(long n) {
return n <= 0 ? 0 : fib_bitrev(n, 1);
}
Thank you for that. Sadly I am not able to run it because my
test environment is lacking _rdtsc(), and I haven't been able
to find out how to install it.
By the way, I send you an email earlier today. Did that get to
you?
I ran a python version to compute fibonacci( 10000000 ). This code
ran 30% faster than the previous version, which I would be is
almost entirely due to the expression factoring with 'c'. Great!
On Fri, 20 Dec 2024 14:31:54 -0800
Tim Rentsch <[email protected]> wrote:
Thank you for that. Sadly I am not able to run it because my
test environment is lacking _rdtsc(), and I haven't been able
to find out how to install it.
Any x86-64 Linux should have _rdtsc() as inline function (or built-in) defined in x86intrin.h. No installation required.
On 12/21/24 17:14, Michael S wrote:
On Fri, 20 Dec 2024 14:31:54 -0800
Tim Rentsch <[email protected]> wrote:
Thank you for that. Sadly I am not able to run it because my
test environment is lacking _rdtsc(), and I haven't been able
to find out how to install it.
Any x86-64 Linux should have _rdtsc() as inline function (or
built-in) defined in x86intrin.h. No installation required.
On my x86-64 Linux system, that header (indirectly) declares
__rdtsc(), but not _rdtsc.
On Fri, 20 Dec 2024 14:31:54 -0800
Tim Rentsch <[email protected]> wrote:
Thank you for that. Sadly I am not able to run it because my
test environment is lacking _rdtsc(), and I haven't been able
to find out how to install it.
Any x86-64 Linux should have _rdtsc() as inline function (or built-in) defined in x86intrin.h. No installation required.
On Fri, 20 Dec 2024 14:59:07 -0800
Tim Rentsch <[email protected]> wrote:
I ran a python version to compute fibonacci( 10000000 ). This code
ran 30% faster than the previous version, which I would be is
almost entirely due to the expression factoring with 'c'. Great!
This one does fib(10M) in 89 msec on my very old home PC.
[code]
Michael S <[email protected]> writes:
On Fri, 20 Dec 2024 14:31:54 -0800
Tim Rentsch <[email protected]> wrote:
Thank you for that. Sadly I am not able to run it because my
test environment is lacking _rdtsc(), and I haven't been able
to find out how to install it.
Any x86-64 Linux should have _rdtsc() as inline function (or
built-in) defined in x86intrin.h. No installation required.
Okay. As it turns out knowing that doesn't help me, but thank
you for the info. I'm taking this as a sign from heaven that
I should be spending my energies somewhere other than measuring
performance.
On Sun, 22 Dec 2024 10:33:57 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 20 Dec 2024 14:31:54 -0800
Tim Rentsch <[email protected]> wrote:
Thank you for that. Sadly I am not able to run it because my
test environment is lacking _rdtsc(), and I haven't been able
to find out how to install it.
Any x86-64 Linux should have _rdtsc() as inline function (or
built-in) defined in x86intrin.h. No installation required.
Okay. As it turns out knowing that doesn't help me, but thank
you for the info. I'm taking this as a sign from heaven that
I should be spending my energies somewhere other than measuring
performance.
Did you see correction of James Kuyper?
Under Linux it is called __rdtsc().
Michael S <[email protected]> writes:
On Fri, 20 Dec 2024 14:59:07 -0800
Tim Rentsch <[email protected]> wrote:
I ran a python version to compute fibonacci( 10000000 ). This code
ran 30% faster than the previous version, which I would be is
almost entirely due to the expression factoring with 'c'. Great!
This one does fib(10M) in 89 msec on my very old home PC.
[code]
You are the speed king!
On Sun, 22 Dec 2024 10:25:59 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 20 Dec 2024 14:59:07 -0800
Tim Rentsch <[email protected]> wrote:
I ran a python version to compute fibonacci( 10000000 ). This code
ran 30% faster than the previous version, which I would be is
almost entirely due to the expression factoring with 'c'. Great!
This one does fib(10M) in 89 msec on my very old home PC.
[code]
You are the speed king!
But approximately the same code in python is MUCH slower.
2.9 sec on 11 y.o. PC that still serves me as a desktop at work.
2.0 sec on 5.5 y.o. mall server.
I don't really understand why.
I expected that nearly all time is spend in multiplication of few huge numbers, probably over 75% of time in last couple of steps (6 multiplications). And I was under impression that internally Python
uses the same GMP library that I used in C. So I expected the same
speed, more or less. May be, 1.5x slowdown because of less efficient
memory handling in Python. 35x difference is a big surprise.
Here is my python routine.
def fib(n):
if n < 1:
return 0
tmp = n
msb = 1
while tmp > 1:
msb += msb
tmp //= 2
f0=0; f1=1;
while msb > 1:
ff0 = f0*f0 + f1*f1
f1 = (f0+f0+f1)*f1
f0 = ff0
msb //= 2
if msb & n:
f0 = f1
f1 += ff0
return f1
Michael S <[email protected]> writes:
On Sun, 22 Dec 2024 10:25:59 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 20 Dec 2024 14:59:07 -0800
Tim Rentsch <[email protected]> wrote:
I ran a python version to compute fibonacci( 10000000 ). This
code ran 30% faster than the previous version, which I would be
is almost entirely due to the expression factoring with 'c'.
Great!
This one does fib(10M) in 89 msec on my very old home PC.
[code]
You are the speed king!
But approximately the same code in python is MUCH slower.
2.9 sec on 11 y.o. PC that still serves me as a desktop at work.
2.0 sec on 5.5 y.o. mall server.
That matches my experience with doing this in python.
I don't really understand why.
I expected that nearly all time is spend in multiplication of few
huge numbers, probably over 75% of time in last couple of steps (6 multiplications). And I was under impression that internally Python
uses the same GMP library that I used in C. So I expected the same
speed, more or less. May be, 1.5x slowdown because of less
efficient memory handling in Python. 35x difference is a big
surprise.
Yes, I have no explanation either. Your reasoning looks sound to
me.
On Tue, 24 Dec 2024 05:21:42 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Sun, 22 Dec 2024 10:25:59 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 20 Dec 2024 14:59:07 -0800
Tim Rentsch <[email protected]> wrote:
I ran a python version to compute fibonacci( 10000000 ). This
code ran 30% faster than the previous version, which I would be
is almost entirely due to the expression factoring with 'c'.
Great!
This one does fib(10M) in 89 msec on my very old home PC.
[code]
You are the speed king!
But approximately the same code in python is MUCH slower.
2.9 sec on 11 y.o. PC that still serves me as a desktop at work.
2.0 sec on 5.5 y.o. mall server.
That matches my experience with doing this in python.
I don't really understand why.
I expected that nearly all time is spend in multiplication of few
huge numbers, probably over 75% of time in last couple of steps (6 multiplications). And I was under impression that internally
Python uses the same GMP library that I used in C. So I expected
the same speed, more or less. May be, 1.5x slowdown because of
less efficient memory handling in Python. 35x difference is a big surprise.
Yes, I have no explanation either. Your reasoning looks sound to
me.
I did few more time measurements both with GMP and with Python. It
clearly showed that they use not just different implementations of
bignum multiplication, but different algorithms.
GMP multiplication for operands in range of above ~200,000 bits is approximately O(N*log(N)) with number of bits, or just a little bit
slower than that.
Python multiplication scale much worse, approximately as (N**1.55).
And indeed after binging more I had seen the following discussion on
reddit: https://www.reddit.com/r/Python/comments/7h62ul/python_largeinteger_computation_speed_a_comparison/
A relevant quote:
"Python seems to use two algorithms for bignum multiplication: "the
O(N**2) school algorithm" and Karatsuba if both operands contain more
than 70 decimal digits. (Your example reaches 70 decimal digits
already after the 5th(?) squaring, and after the 22nd there are
~17million decimal digits(?). The wast majority of the time is spent
on the last one or two multiplications i.e. Karatsuba.)
Meanwhile GMP implements seven multiplication algorithms: The same(?) "school" and Karatsuba algorithms, plus various Toom and an FFT method (Strassen), but I'm unclear on what the exact thresholds are that
select these. It seems these thresholds are in "limbs" ("the part of a multi-precision number that fits in a single machine word") instead of decimal digits, with e.g. FFT method being used after "somewhere
between 3000 and 10000 limbs", i.e. >~30k - 100k(?) decimal digits, so
it would probably be used for the relevant steps in your example."
It seem that that threshold for FFT method mentioned in the quote
applies to number of limbs (i.e. 64-bit groups of bits in our specific
case) in the product rather than in the operands. Also it is possible
that in the newer versions of GMP the threshold is a lower than it
was 7 years ago.
I wonder, why python uses its own thing instead of GMP.
Incompatible licenses or plain NIH ?
[GMP in python?]
Forgot to mention.
As said in the same redit thread, one can use GMP from python. It's
pretty easy, just not the default.
[example]
For big values of n gmpy2-based variant calculates Fibbonaci numbers at approximately the same speed as C variant. For smaller numbers - not
quite the same as C and for n < ~10,000 it is slower than default
python math.
On Tue, 24 Dec 2024 05:21:42 -0800
Tim Rentsch <[email protected]> wrote:
[python runs much slower than GMP]
Yes, I have no explanation either. Your reasoning looks sound to
me.
I did few more time measurements both with GMP and with Python. It
clearly showed that they use not just different implementations of
bignum multiplication, but different algorithms.
GMP multiplication for operands in range of above ~200,000 bits is approximately O(N*log(N)) with number of bits, or just a little bit
slower than that.
Python multiplication scale much worse, approximately as (N**1.55).
And indeed after binging more I had seen the following discussion on
reddit: [link]
A relevant quote:
"Python seems to use two algorithms for bignum multiplication: "the
O(N**2) school algorithm" and Karatsuba if both operands contain more
than 70 decimal digits. (Your example reaches 70 decimal digits already after the 5th(?) squaring, and after the 22nd there are ~17million
decimal digits(?). The wast majority of the time is spent on the last
one or two multiplications i.e. Karatsuba.)
Meanwhile GMP implements seven multiplication algorithms: The same(?) "school" and Karatsuba algorithms, plus various Toom and an FFT method (Strassen), but I'm unclear on what the exact thresholds are that
select these. It seems these thresholds are in "limbs" ("the part of a multi-precision number that fits in a single machine word") instead of decimal digits, with e.g. FFT method being used after "somewhere
between 3000 and 10000 limbs", i.e. >~30k - 100k(?) decimal digits, so
it would probably be used for the relevant steps in your example."
It seem that that threshold for FFT method mentioned in the quote
applies to number of limbs (i.e. 64-bit groups of bits in our specific
case) in the product rather than in the operands. Also it is possible
that in the newer versions of GMP the threshold is a lower than it
was 7 years ago.
I wonder, why python uses its own thing instead of GMP.
Incompatible licenses or plain NIH ?
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 160:08:55 |
| Calls: | 12,094 |
| Calls today: | 2 |
| Files: | 15,000 |
| Messages: | 6,517,761 |