• Re: I'm going to give up on this...

    From Michael S@21:1/5 to Vir Campestris on Thu Oct 31 12:44:24 2024
    On Thu, 31 Oct 2024 10:21:54 +0000
    Vir Campestris <[email protected]d> wrote:

    I have a piece of code that takes a couple of seconds to execute. Or
    perhaps it takes three times longer, depending on a minor change.

    The change?

    If the function declaration is
    template <typename collection>
    Masker& Masker::andequals(size_t cachesize, collection& primes)

    it is fast. But if I tell it what the collection is
    template<>
    Masker& Masker::andequals(size_t cachesize, std::vector<Prime>&
    primes)

    It takes three times as long to execute.

    But only with g++, and only with -Ofast optimisation. With clang++ I
    see no such effect (it's slow all the time)

    WTF?

    Andy

    In the second case it's no longer a template, right?
    So, does it became real function instead of inline?
    What if

    static inline
    Masker& Masker::andequals(
    size_t cachesize, std::vector<Prime>& primes) {
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to All on Thu Oct 31 10:21:54 2024
    I have a piece of code that takes a couple of seconds to execute. Or
    perhaps it takes three times longer, depending on a minor change.

    The change?

    If the function declaration is
    template <typename collection>
    Masker& Masker::andequals(size_t cachesize, collection& primes)

    it is fast. But if I tell it what the collection is
    template<>
    Masker& Masker::andequals(size_t cachesize, std::vector<Prime>& primes)

    It takes three times as long to execute.

    But only with g++, and only with -Ofast optimisation. With clang++ I see
    no such effect (it's slow all the time)

    WTF?

    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam@21:1/5 to Vir Campestris on Thu Oct 31 07:47:00 2024
    Vir Campestris writes:

    I have a piece of code that takes a couple of seconds to execute. Or perhaps it takes three times longer, depending on a minor change.

    The change?

    If the function declaration is
    template <typename collection>
    Masker& Masker::andequals(size_t cachesize, collection& primes)

    it is fast. But if I tell it what the collection is
    template<>
    Masker& Masker::andequals(size_t cachesize, std::vector<Prime>& primes)

    It takes three times as long to execute.

    But only with g++, and only with -Ofast optimisation. With clang++ I see no such effect (it's slow all the time)

    WTF?

    We are very sorry to inform you that all of our psychics are on their lunch break, right now, and there's nobody left in the office who's certified to operate our mind ray-beam machine that's required to extract the actual
    code, for a detailed inspection and analysis, out of your head.

    We apologize for the inconvenience.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Michael S on Thu Oct 31 16:43:50 2024
    On 31/10/2024 10:44, Michael S wrote:
    In the second case it's no longer a template, right?
    So, does it became real function instead of inline?
    What if

    static inline
    Masker&Masker::andequals(
    size_t cachesize,std::vector<Prime>& primes) {
    }

    That does indeed make it fast.

    So all that guff we've ever said about making things modular, and
    breaking functions up into small parts? If you want speed, don't do it.

    (clang speed is unaffected).

    And Sam, when your psychics come back from lunch let me know ;)

    ANdy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mad Hamish@21:1/5 to [email protected] on Sun Nov 3 21:46:07 2024
    On Thu, 31 Oct 2024 16:43:50 +0000, Vir Campestris <[email protected]d> wrote:

    On 31/10/2024 10:44, Michael S wrote:
    In the second case it's no longer a template, right?
    So, does it became real function instead of inline?
    What if

    static inline
    Masker&Masker::andequals(
    size_t cachesize,std::vector<Prime>& primes) {
    }

    That does indeed make it fast.

    So all that guff we've ever said about making things modular, and
    breaking functions up into small parts? If you want speed, don't do it.

    Generally a minor loss in speed is much less important than people
    being able to tell what's actually happening

    (clang speed is unaffected).

    And Sam, when your psychics come back from lunch let me know ;)

    ANdy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Vir Campestris on Sun Nov 3 13:41:18 2024
    On Thu, 31 Oct 2024 16:43:50 +0000
    Vir Campestris <[email protected]d> wrote:

    On 31/10/2024 10:44, Michael S wrote:
    In the second case it's no longer a template, right?
    So, does it became real function instead of inline?
    What if

    static inline
    Masker&Masker::andequals(
    size_t cachesize,std::vector<Prime>& primes) {
    }

    That does indeed make it fast.

    So all that guff we've ever said about making things modular, and
    breaking functions up into small parts? If you want speed, don't do
    it.


    I don't quite understand how it follows. Didn't you get it both modular
    and fast?

    BTW, I strongly suspect that it would be equally fast when you do not
    specify 'inline'.
    I also suspect, but less strongly, that if you opt for either Global
    Program Optimization or Link-time Code Generation then it would be
    equally fast even without 'static'. Less strongly, because of absence
    of 1st-hand experience with GPO/LTCG on gcc tools.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Michael S on Mon Nov 4 21:24:55 2024
    On 03/11/2024 11:41, Michael S wrote:
    I don't quite understand how it follows. Didn't you get it both modular
    and fast?

    BTW, I strongly suspect that it would be equally fast when you do not
    specify 'inline'.
    I also suspect, but less strongly, that if you opt for either Global
    Program Optimization or Link-time Code Generation then it would be
    equally fast even without 'static'. Less strongly, because of absence
    of 1st-hand experience with GPO/LTCG on gcc tools.

    By using inline, and putting all the code in the same file, I got the
    speed. I only worked this out after I'd broken the functions out into
    separate source files (and object files) to find the code was always slow.

    ANdy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Mad Hamish on Mon Nov 4 21:30:44 2024
    On 03/11/2024 10:46, Mad Hamish wrote:
    Generally a minor loss in speed is much less important than people
    being able to tell what's actually happening

    Requirement one: The code works correctly.
    Requirement two: Your fellow workers can understand why
    Requirement three: It's fast.

    There are cases where high speed is a requirement, but in my experience
    they were rare. And usually benchmarks ;)

    But there's no point in having a program that is really really fast, but produces the wrong answer.

    When under some rare circumstances your code does something bad happens
    your fellow worker might have to fix it, which is why #2.

    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)