Forum: >>> Magnum BBS <<<

Re: Assembling span-dependent instructions

From Kaz Kylheku@21:1/5 to Anton Ertl on Wed Jul 27 22:52:40 2022

XPost: comp.arch

On 2022-07-27, Anton Ertl <[email protected]> wrote:

However, one can also construct cases where making the code larger can
reduce the minimum size of the immediate operand, e.g.:

foo:
movl foo+133-bar(%rdi),%eax
bar:

That's weird; what is accessed this way, relative to the code,
and does it occur in compiler output?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Thu Jul 28 16:02:16 2022

Anton,

Curiously, I recently used that example (limited to jumps) to explain
something on Quora, how two different approaches can yield different
answers to the same question.

You can make both, the large-and-shrink, and small-and-grow approaches
converge (just ban doing the opposite, once shrunk an instruction cannot
regrow or once grown an instruction cannot reshrink). And under certain assumptions, you can show that the small-and-grow result will always be
"more optimal" (smaller) than the large-and-shrink result. However, one of those assumptions is that shrinking some instruction will not require a different instruction to grow--the results must be monotonic. That is not always true in real-life instruction sets.

More relevant is that the order or growing or shrinking instructions can
change which other instructions grow or shrink and that means picking which
one to grow or shrink is important and that contributes to the
NP-completeness of the problem. You potentially have to try all possible orders of growing/shrinking to find the optimal set.

There are even other aspects that impact real-life implementations, such as aligned instructions possibly executing faster, such that one isn't just looking for the smallest size, but the smallest size that runs fastest.

Nice to see someone else remembering the same issues.

Kind regards,
Chris

-- ******************************************************************************

Chris Clark email:
[email protected]
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to [email protected] on Fri Jul 29 14:22:01 2022

On Friday, July 29, 2022 at 1:38:25 PM UTC-7, [email protected] wrote:

(snip)

Z architecture (modern versions of IBM 360) has
such problems too: there are variants of instruction having different
lengths but even longest variant have limited range of available
offsets. At least some versions of Z architecture had severe penalty
for simultaneusly accessing the same cache line for instruction fetch
and data access, so putting constant pools in separate cache line was
very important.

I presume this is true for any system with separate data/instuction
cache. It might be more of a problem for z/, with especially long
cache lines.

From the S/360 days, it was usual for data, even variable data, to be
close to code. That is, for non-reentrant programs. Most assembly
code and Fortran did that.

Otherwise, I believe the original question comes up on any machine
with variable sized branch instructions. Many of the stories I
remember are from the PDP-8.

A similar question comes up generating the Table of Contents
with LaTeX. When you run it, it generates the file used to make
the ToC next time. You run it again to generate the ToC, and
it will tell you if anything moved since the time before.
Hopefully it converges.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Christopher F Clark on Sat Jul 30 09:28:06 2022

Christopher F Clark <[email protected]> writes:

You can make both, the large-and-shrink, and small-and-grow approaches >converge (just ban doing the opposite, once shrunk an instruction cannot >regrow or once grown an instruction cannot reshrink).

My example

foo:
movl foo+133-bar(%rdi),%eax
bar:

is one where start-large-and-shrink fails, unless you can regrow: Once
you have shrunk the instruction, the resulting offset 130 does not fit
into the -128..127 range allowed for a byte offset, so the result
would be a failure. Szymanski's approach is to never shrink
instructions that might have this problem (and he has a way to
distinguish those from others that are guaranteed not to have this
problem, and that he allows to shrink).

By contrast, start-small-and-grow never fails and always converges,
without needing to identify problematic instructions.

More relevant is that the order or growing or shrinking instructions can >change which other instructions grow or shrink and that means picking which >one to grow or shrink is important and that contributes to the >NP-completeness of the problem. You potentially have to try all possible >orders of growing/shrinking to find the optimal set.

It's not about order, it's about sets. If one wants to do better than start-small-and-grow, one could take the set of n problematic
instructions and try out all 2^n assignments of small/large to this
set (the sizes of unproblematic instructions are then determined by start-small-and-grow).

There are even other aspects that impact real-life implementations, such as >aligned instructions possibly executing faster, such that one isn't just >looking for the smallest size, but the smallest size that runs fastest.

Alignment was not considered in Szymanski's paper (and probably not in
the assembler he worked on), so the problem can occur without .align directives. Of course, with .align there are additional reasons for
the non-monotonic behaviour.

Concerning assemblers that select longer instructions for alignment considerations, I have never heard of that, and I doubt that an
assembler does that, except for the filler instructions produced by
the .align directive when it occurs in code.

- anton
--
M. Anton Ertl
[email protected]
http://www.complang.tuwien.ac.at/anton/
[Szymanski was working on a PDP-11 where all instructions are
an integral number or words so there's no alignment to do. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Sun Jul 31 02:28:46 2022

In the large-and-shrink model, you are not allowed to shrink an instruction which causes the program to become incorrect, it's a test you make before shrinking it, so you never shrink your example instruction and have to
regrow it. I suppose you could do it by shrink-test-abort
(regrow)/commit. However, that doesn't fix the problem where shrinking one instruction depends upon shrinking another to result in a valid program
(and the instructions are mutually dependent that way).

And, that's one of the reasons why the two results converge to different programs. There can be cliques of instructions that need to be all shrunk before the program is legal. (Or in the more realistic case, graphs where
you have to color (small, large) just right to get the smallest legal
program, which is why it becomes NP-complete and reduces to a
satisfiability problem.)

If you disallow shrinking instructions which don't produce legal programs (without any other changes, i.e. you cannot shrink your example
instruction), the large-and-shrink model converges. However, there are combinations of shrunk instructions it never tries, because they are
mutually dependent.

And as far as I know, the first presentations of this idea only dealt with shrinking jump instructions, where all distances were positive, so that you could guarantee that the process was monotonic, i.e. all shrinking of instructions made more shrinking possible. And that's where the proofs
that both algorithms could be shown to converge (but to different code sequences) applies.

And the large-and-shrink algorithm was shown first, because if the
algorithm was halted mid-way through, it still resulted in a legal
program. The small-and-grow algorithm starts with an invalid program and
needs to proceed to reach a state where the program is legal. If you don't
let it converge, you have a broken instruction sequence.

Kind regards,
Chris

-- ******************************************************************************

Chris Clark email:
[email protected]
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Fri Jul 31 15:23:30 2026
  from Wales, Uk via Telnet
- Rixter
  Fri Jul 31 12:17:09 2026
  from Madison, Nc via Telnet
- Krenn
  Fri Jul 31 10:41:58 2026
  from Sydney, Nsw via Telnet
- Krenn
  Fri Jul 31 10:34:35 2026
  from Sydney, Nsw via Telnet
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH
- Centurion
  Fri Jul 31 00:59:56 2026
  from Berea, Ohio via Telnet
- Rixter
  Fri Jul 31 00:00:46 2026
  from Madison, Nc via Telnet
- Bob Worm
  Thu Jul 30 20:01:55 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	118:22:57
Calls:	12,465
Calls today:	7
Files:	15,200
Messages:	6,538,266

Re: Assembling span-dependent instructions

Who's Online

Recent Visitors

System Info