• Efficiency of in-order vs. OoO (was: Concertina II Progress)

    From Anton Ertl@21:1/5 to MitchAlsup on Sun Jan 7 08:13:47 2024
    [email protected] (MitchAlsup) writes:
    OoO costs roughly 3× In Order power and provides 1.4× performance (hand >waving accuracy).

    Fortunately, we have measurement data, so we do not need to rely on
    handwaving:

    <https://images.anandtech.com/doci/14072/Exynos9820-Perf-Estimated_575px.png> <https://images.anandtech.com/doci/14072/Exynos9820-Perf-Eff-Estimated.png>

    from the article

    <https://www.anandtech.com/show/14072/the-samsung-galaxy-s10plus-review/4>

    In the Exynos 9820, we see at different points of the DVFS curve:

    A55 | A75
    in-order | OoO
    perf mW pf/mW | perf mW pf/mW
    1.0 22 0.046 | 3.7 88 0.042 highest efficiency point for each core
    1.4 33 0.042 | 3.7 88 0.042 same pf/mW at highest common efficiency
    2.7 90 0.030 | 3.7 88 0.042 same mW at lowest common mW
    5.1 400 0.013 | 5.1 124 0.041 same perf at highest common performance
    5.1 400 0.013 | 10.5 400 0.027 same mW at highest common mW
    5.1 400 0.013 | 17.2 1270 0.013 highest performance point for each core

    "prf" is SPEC2006 Int+FP Geomean. "pf/mW" (shown as "Perf/W" in the
    second graph) is SPEC Int+FP Geomean/mW (you can confirm this by
    computing corresponding numbers from the first graph).

    So, at the highest efficiency point for each core, the OoO A75
    consumes 4 times the power and delivers 3.7 times the performance of
    the A55. As soon as you need a little more performance, the
    efficiency of the A55 drops to the same level as the A75 (e.g., 2.6
    times the performance at 2.6 times the power), but up until the A55
    reaches the lowest power consumption of the A75 at 88mW, the A55 still
    fills a niche; at that power consumption, tha A75 delivers 1.4 times
    the performance of the A55. There is no reason to use the A55 beyond
    this point if an A75 is free. And beyond 170mW, even the Exynos M4
    outcompetes the A55 in every respect.

    If there are more threads than A75 and M4 cores, it's an interesting
    question if it is beneficial for power consumption to shift some of
    the work to A55 cores and run the A75 and M4 at a correspondingly
    better efficiency point. As long as the perf/W on the A55 is not
    worse than the original perf/W on the other cores, that should help
    (at least if the threads don't have to talk to each other too much),
    but given the low performance of the A55, especially where it is
    efficient, it won't help much.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)