I have a piece of code that takes a couple of seconds to execute. Or
perhaps it takes three times longer, depending on a minor change.
The change?
If the function declaration is
template <typename collection>
Masker& Masker::andequals(size_t cachesize, collection& primes)
it is fast. But if I tell it what the collection is
template<>
Masker& Masker::andequals(size_t cachesize, std::vector<Prime>&
primes)
It takes three times as long to execute.
But only with g++, and only with -Ofast optimisation. With clang++ I
see no such effect (it's slow all the time)
WTF?
Andy
I have a piece of code that takes a couple of seconds to execute. Or perhaps it takes three times longer, depending on a minor change.
The change?
If the function declaration is
template <typename collection>
Masker& Masker::andequals(size_t cachesize, collection& primes)
it is fast. But if I tell it what the collection is
template<>
Masker& Masker::andequals(size_t cachesize, std::vector<Prime>& primes)
It takes three times as long to execute.
But only with g++, and only with -Ofast optimisation. With clang++ I see no such effect (it's slow all the time)
WTF?
In the second case it's no longer a template, right?
So, does it became real function instead of inline?
What if
static inline
Masker&Masker::andequals(
size_t cachesize,std::vector<Prime>& primes) {
}
On 31/10/2024 10:44, Michael S wrote:
In the second case it's no longer a template, right?
So, does it became real function instead of inline?
What if
static inline
Masker&Masker::andequals(
size_t cachesize,std::vector<Prime>& primes) {
}
That does indeed make it fast.
So all that guff we've ever said about making things modular, and
breaking functions up into small parts? If you want speed, don't do it.
(clang speed is unaffected).
And Sam, when your psychics come back from lunch let me know ;)
ANdy
On 31/10/2024 10:44, Michael S wrote:
In the second case it's no longer a template, right?
So, does it became real function instead of inline?
What if
static inline
Masker&Masker::andequals(
size_t cachesize,std::vector<Prime>& primes) {
}
That does indeed make it fast.
So all that guff we've ever said about making things modular, and
breaking functions up into small parts? If you want speed, don't do
it.
I don't quite understand how it follows. Didn't you get it both modular
and fast?
BTW, I strongly suspect that it would be equally fast when you do not
specify 'inline'.
I also suspect, but less strongly, that if you opt for either Global
Program Optimization or Link-time Code Generation then it would be
equally fast even without 'static'. Less strongly, because of absence
of 1st-hand experience with GPO/LTCG on gcc tools.
Generally a minor loss in speed is much less important than people
being able to tell what's actually happening
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 718 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 138:39:49 |
| Calls: | 12,136 |
| Calls today: | 4 |
| Files: | 15,019 |
| Messages: | 6,519,984 |