• Re: Assembling span-dependent instructions

    From Kaz Kylheku@21:1/5 to Anton Ertl on Wed Jul 27 22:52:40 2022
    XPost: comp.arch

    On 2022-07-27, Anton Ertl <[email protected]> wrote:
    However, one can also construct cases where making the code larger can
    reduce the minimum size of the immediate operand, e.g.:

    foo:
    movl foo+133-bar(%rdi),%eax
    bar:

    That's weird; what is accessed this way, relative to the code,
    and does it occur in compiler output?

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Thu Jul 28 16:02:16 2022
    Anton,

    Curiously, I recently used that example (limited to jumps) to explain
    something on Quora, how two different approaches can yield different
    answers to the same question.

    You can make both, the large-and-shrink, and small-and-grow approaches
    converge (just ban doing the opposite, once shrunk an instruction cannot
    regrow or once grown an instruction cannot reshrink). And under certain assumptions, you can show that the small-and-grow result will always be
    "more optimal" (smaller) than the large-and-shrink result. However, one of those assumptions is that shrinking some instruction will not require a different instruction to grow--the results must be monotonic. That is not always true in real-life instruction sets.

    More relevant is that the order or growing or shrinking instructions can
    change which other instructions grow or shrink and that means picking which
    one to grow or shrink is important and that contributes to the
    NP-completeness of the problem. You potentially have to try all possible orders of growing/shrinking to find the optimal set.

    There are even other aspects that impact real-life implementations, such as aligned instructions possibly executing faster, such that one isn't just looking for the smallest size, but the smallest size that runs fastest.

    Nice to see someone else remembering the same issues.

    Kind regards,
    Chris

    -- ******************************************************************************

    Chris Clark email:
    [email protected]
    Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to [email protected] on Fri Jul 29 14:22:01 2022
    On Friday, July 29, 2022 at 1:38:25 PM UTC-7, [email protected] wrote:

    (snip)

    Z architecture (modern versions of IBM 360) has
    such problems too: there are variants of instruction having different
    lengths but even longest variant have limited range of available
    offsets. At least some versions of Z architecture had severe penalty
    for simultaneusly accessing the same cache line for instruction fetch
    and data access, so putting constant pools in separate cache line was
    very important.

    I presume this is true for any system with separate data/instuction
    cache. It might be more of a problem for z/, with especially long
    cache lines.

    From the S/360 days, it was usual for data, even variable data, to be
    close to code. That is, for non-reentrant programs. Most assembly
    code and Fortran did that.

    Otherwise, I believe the original question comes up on any machine
    with variable sized branch instructions. Many of the stories I
    remember are from the PDP-8.

    A similar question comes up generating the Table of Contents
    with LaTeX. When you run it, it generates the file used to make
    the ToC next time. You run it again to generate the ToC, and
    it will tell you if anything moved since the time before.
    Hopefully it converges.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Christopher F Clark on Sat Jul 30 09:28:06 2022
    Christopher F Clark <[email protected]> writes:
    You can make both, the large-and-shrink, and small-and-grow approaches >converge (just ban doing the opposite, once shrunk an instruction cannot >regrow or once grown an instruction cannot reshrink).

    My example

    foo:
    movl foo+133-bar(%rdi),%eax
    bar:

    is one where start-large-and-shrink fails, unless you can regrow: Once
    you have shrunk the instruction, the resulting offset 130 does not fit
    into the -128..127 range allowed for a byte offset, so the result
    would be a failure. Szymanski's approach is to never shrink
    instructions that might have this problem (and he has a way to
    distinguish those from others that are guaranteed not to have this
    problem, and that he allows to shrink).

    By contrast, start-small-and-grow never fails and always converges,
    without needing to identify problematic instructions.

    More relevant is that the order or growing or shrinking instructions can >change which other instructions grow or shrink and that means picking which >one to grow or shrink is important and that contributes to the >NP-completeness of the problem. You potentially have to try all possible >orders of growing/shrinking to find the optimal set.

    It's not about order, it's about sets. If one wants to do better than start-small-and-grow, one could take the set of n problematic
    instructions and try out all 2^n assignments of small/large to this
    set (the sizes of unproblematic instructions are then determined by start-small-and-grow).

    There are even other aspects that impact real-life implementations, such as >aligned instructions possibly executing faster, such that one isn't just >looking for the smallest size, but the smallest size that runs fastest.

    Alignment was not considered in Szymanski's paper (and probably not in
    the assembler he worked on), so the problem can occur without .align directives. Of course, with .align there are additional reasons for
    the non-monotonic behaviour.

    Concerning assemblers that select longer instructions for alignment considerations, I have never heard of that, and I doubt that an
    assembler does that, except for the filler instructions produced by
    the .align directive when it occurs in code.

    - anton
    --
    M. Anton Ertl
    [email protected]
    http://www.complang.tuwien.ac.at/anton/
    [Szymanski was working on a PDP-11 where all instructions are
    an integral number or words so there's no alignment to do. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Sun Jul 31 02:28:46 2022
    In the large-and-shrink model, you are not allowed to shrink an instruction which causes the program to become incorrect, it's a test you make before shrinking it, so you never shrink your example instruction and have to
    regrow it. I suppose you could do it by shrink-test-abort
    (regrow)/commit. However, that doesn't fix the problem where shrinking one instruction depends upon shrinking another to result in a valid program
    (and the instructions are mutually dependent that way).

    And, that's one of the reasons why the two results converge to different programs. There can be cliques of instructions that need to be all shrunk before the program is legal. (Or in the more realistic case, graphs where
    you have to color (small, large) just right to get the smallest legal
    program, which is why it becomes NP-complete and reduces to a
    satisfiability problem.)

    If you disallow shrinking instructions which don't produce legal programs (without any other changes, i.e. you cannot shrink your example
    instruction), the large-and-shrink model converges. However, there are combinations of shrunk instructions it never tries, because they are
    mutually dependent.

    And as far as I know, the first presentations of this idea only dealt with shrinking jump instructions, where all distances were positive, so that you could guarantee that the process was monotonic, i.e. all shrinking of instructions made more shrinking possible. And that's where the proofs
    that both algorithms could be shown to converge (but to different code sequences) applies.

    And the large-and-shrink algorithm was shown first, because if the
    algorithm was halted mid-way through, it still resulted in a legal
    program. The small-and-grow algorithm starts with an invalid program and
    needs to proceed to reach a state where the program is legal. If you don't
    let it converge, you have a broken instruction sequence.

    Kind regards,
    Chris

    -- ******************************************************************************

    Chris Clark email:
    [email protected]
    Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)