Forum: >>> Magnum BBS <<<

Redundant prefixes break fsrm in Ice Lake

From Tavis Ormandy@21:1/5 to All on Wed Nov 15 14:59:06 2023

I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

Tavis.

--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger [email protected]
_\_V _( ) _( ) @taviso

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Tavis Ormandy on Wed Nov 15 19:10:13 2023

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the programmer expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

Tavis.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to MitchAlsup on Wed Nov 15 20:17:30 2023

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the programmer >expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Scott Lurndal on Wed Nov 15 20:57:58 2023

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb seems >>> to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the programmer >>expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From BGB@21:1/5 to MitchAlsup on Wed Nov 15 16:36:51 2023

On 11/15/2023 2:57 PM, MitchAlsup wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb
seems
to cause ROB entries to be associated with incorrect addresses. I have >>>> no special insight into what the microcode is doing, maybe some reader >>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I think it is semi-common to align function entry points and some labels
and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...

I think the idea here is that aligning a function entry points can
potentially make the function calls slightly faster due to "cache magic"
or similar. Also INT3 crashes the program if it tries to branch into
this padding space.

But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.

And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to MitchAlsup on Wed Nov 15 22:54:26 2023

[email protected] (MitchAlsup) writes:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb seems >>>> to cause ROB entries to be associated with incorrect addresses. I have >>>> no special insight into what the microcode is doing, maybe some reader >>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor, >>>instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container >>>then execution is delivered a different pattern of bits than the programmer >>>expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I refer you to the Intel Architecture Software Optimization Guide.

Specifically:

"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch
targets should be 16-byte aligned."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to BGB on Wed Nov 15 23:34:46 2023

BGB wrote:

On 11/15/2023 2:57 PM, MitchAlsup wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we >>>>> discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb
seems
to cause ROB entries to be associated with incorrect addresses. I have >>>>> no special insight into what the microcode is doing, maybe some reader >>>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I think it is semi-common to align function entry points and some labels
and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...

<
Yes, this is common (and useful)
<
How many functions start off with REP REP REP MOVS ??
<

I think the idea here is that aligning a function entry points can potentially make the function calls slightly faster due to "cache magic"
or similar. Also INT3 crashes the program if it tries to branch into
this padding space.

<
But REP REP REP MOVS never occurs at the entry point of a function !!
<

But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.

And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Scott Lurndal on Wed Nov 15 23:36:15 2023

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we >>>>> discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb seems >>>>> to cause ROB entries to be associated with incorrect addresses. I have >>>>> no special insight into what the microcode is doing, maybe some reader >>>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes >>>>on decoding. So, if you have multiple prefixes of the same flavor, >>>>instead of latching only the last (or first) prefix data, but instead >>>>ORs all the prefix data of a "kind" of prefix into a prefix container >>>>then execution is delivered a different pattern of bits than the programmer >>>>expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I refer you to the Intel Architecture Software Optimization Guide.

Specifically:

"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch
targets should be 16-byte aligned."

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions following label boundaries.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From BGB@21:1/5 to MitchAlsup on Wed Nov 15 17:53:25 2023

On 11/15/2023 5:34 PM, MitchAlsup wrote:

BGB wrote:

On 11/15/2023 2:57 PM, MitchAlsup wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we >>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>>
The quick summary is that adding a redundant rex.r prefix to movsb >>>>>> seems
to cause ROB entries to be associated with incorrect addresses. I
have
no special insight into what the microcode is doing, maybe some
reader
here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes >>>>> on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead >>>>> ORs all the prefix data of a "kind" of prefix into a prefix container >>>>> then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I think it is semi-common to align function entry points and some
labels and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...

<
Yes, this is common (and useful)
<
How many functions start off with REP REP REP MOVS ??
<

I think the idea here is that aligning a function entry points can
potentially make the function calls slightly faster due to "cache
magic" or similar. Also INT3 crashes the program if it tries to branch
into this padding space.

<
But REP REP REP MOVS never occurs at the entry point of a function !!
<

Granted, yes, I have not seen this one.

IME, it is usually something like:
...; INT3; INT3; INT3; PUSH RBP; MOV RBP, RSP; ...
Or similar...

And, at the end of a function:
RET; NOP; NOP; ...; INT3; INT3; ...

With any label-alignment via one of the multi-byte NOP encodings.

But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.

And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to MitchAlsup on Thu Nov 16 00:36:31 2023

[email protected] (MitchAlsup) writes:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we >>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>>
The quick summary is that adding a redundant rex.r prefix to movsb seems >>>>>> to cause ROB entries to be associated with incorrect addresses. I have >>>>>> no special insight into what the microcode is doing, maybe some reader >>>>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined. >>>>>When the decoder encounters a prefix, it latches prefix data and goes >>>>>on decoding. So, if you have multiple prefixes of the same flavor, >>>>>instead of latching only the last (or first) prefix data, but instead >>>>>ORs all the prefix data of a "kind" of prefix into a prefix container >>>>>then execution is delivered a different pattern of bits than the programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I refer you to the Intel Architecture Software Optimization Guide.

Specifically:

"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch
targets should be 16-byte aligned."

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Nov 16 00:56:36 2023

According to Scott Lurndal <[email protected]>:

But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

What? Why wouldn't you use a NOP? The Intel manual has a list
of NOPs with sizes from one byte to nine.

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to BGB on Thu Nov 16 03:00:41 2023

BGB wrote:

On 11/15/2023 5:34 PM, MitchAlsup wrote:

BGB wrote:

On 11/15/2023 2:57 PM, MitchAlsup wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we >>>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>>>
The quick summary is that adding a redundant rex.r prefix to movsb >>>>>>> seems
to cause ROB entries to be associated with incorrect addresses. I >>>>>>> have
no special insight into what the microcode is doing, maybe some
reader
here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined. >>>>>> When the decoder encounters a prefix, it latches prefix data and goes >>>>>> on decoding. So, if you have multiple prefixes of the same flavor, >>>>>> instead of latching only the last (or first) prefix data, but instead >>>>>> ORs all the prefix data of a "kind" of prefix into a prefix container >>>>>> then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ?? >>>>

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I think it is semi-common to align function entry points and some
labels and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...

<
Yes, this is common (and useful)
<
How many functions start off with REP REP REP MOVS ??
<

I think the idea here is that aligning a function entry points can
potentially make the function calls slightly faster due to "cache
magic" or similar. Also INT3 crashes the program if it tries to branch
into this padding space.

<
But REP REP REP MOVS never occurs at the entry point of a function !!
<

Granted, yes, I have not seen this one.

IME, it is usually something like:
...; INT3; INT3; INT3; PUSH RBP; MOV RBP, RSP; ...
Or similar...

And, at the end of a function:
RET; NOP; NOP; ...; INT3; INT3; ...

With any label-alignment via one of the multi-byte NOP encodings.

Yes, but control leaves the previous function at RET and control arrives at
the next function at INT3 so the NOPs are never actually executed. And if
you looked at the ASCII assembly, you will see::

RET
NOP
NOP
label:
INT3

But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.

And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Scott Lurndal on Thu Nov 16 03:06:11 2023

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we >>>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>>>
The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have >>>>>>> no special insight into what the microcode is doing, maybe some reader >>>>>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined. >>>>>>When the decoder encounters a prefix, it latches prefix data and goes >>>>>>on decoding. So, if you have multiple prefixes of the same flavor, >>>>>>instead of latching only the last (or first) prefix data, but instead >>>>>>ORs all the prefix data of a "kind" of prefix into a prefix container >>>>>>then execution is delivered a different pattern of bits than the programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

I refer you to the Intel Architecture Software Optimization Guide.

Specifically:

"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch >>> targets should be 16-byte aligned."

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

face it:: x86 is so broken it is amazing that it works at all.

And never postulate that this is the BEST way of padding to some useful boundary--just like 68K used to

CMP D1,#7
BNE ELSE
// then clause
...
...
MOV dummy,#DW // consume the inst in the Else clause
ELSE:
INST // The immediate of the MOV consumes this instruction
// join point

And if you EVER get the chance of do your own ISA, make sure there is no
way to and no need to do these kinds of things.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to All on Thu Nov 16 03:07:17 2023

Is there somethings wrong with

...
RET
.align 64B
Function:
...

??

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to MitchAlsup on Thu Nov 16 08:26:55 2023

MitchAlsup wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

Tavis Ormandy wrote:

I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.

The quick summary is that adding a redundant rex.r prefix to movsb
seems
to cause ROB entries to be associated with incorrect addresses. I have >>>> no special insight into what the microcode is doing, maybe some reader >>>> here can read between the lines and explain what is going on :)

https://lock.cmpxchg8b.com/reptar.html

<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

The code is already byte aligned, what more is necessary ??

Some loops, on some machines, run faster if the loop top is cache line
aligned (or maybe 16-byte/32-byte aligned), since that allows the entire
loop to fit within a single cache line, or whatever the loop buffer is?

I'm not arguing with you that it shouldn't be needed, just that there
have been and are several machines which do benefit from it.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to John Levine on Thu Nov 16 11:37:25 2023

John Levine wrote:

According to Scott Lurndal <[email protected]>:

But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

What? Why wouldn't you use a NOP? The Intel manual has a list
of NOPs with sizes from one byte to nine.

Maybe because a few added/redundant prefix bytes on an instruction you
are going to do anyway could be even cheaper/faster than a NOP?

I.e. 0 vs 1 cycle (or a fraction thereof on average)?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Scott Lurndal on Thu Nov 16 11:23:05 2023

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

Intel Instruction manual vol2 section 2.1:

"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior"

These prefix rules were added relatively recently (maybe last 10 years?).
While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to EricP on Fri Nov 17 18:42:18 2023

EricP wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

Intel Instruction manual vol2 section 2.1:

"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior"

These prefix rules were added relatively recently (maybe last 10 years?). While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

This is what I was talking about; the decoder is just routing data to
a set of storage containers and only after identifying the OpCode, do
these containers modify the behavior of the instruction during execution.
The decoder would not "count" the prefixes, just route data, and if
data came from multiple locations, what gets latched in the container
becomes mask specific.

And since they are reserved, your random code generator should not be generating them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul A. Clayton@21:1/5 to John Levine on Fri Nov 17 22:38:26 2023

On 11/15/23 7:56 PM, John Levine wrote:

According to Scott Lurndal <[email protected]>:

But who ever decided multiple prefixes of the same kind are LEGAL ??

The compiler people use multiple prefixes to align code.

What? Why wouldn't you use a NOP? The Intel manual has a list
of NOPs with sizes from one byte to nine.

It is possible that a NOP is more expensive than bloating one or
more instructions with alternative encodings. Even if a NOP is
never "executed" (and, of course, early microprocessors did just
execute NOPs), it might consume a ROB entry (to facilitate precise
trapping when an instruction address is fetched, e.g. — obviously
one could have coarser-grained ROB entries and replay from an
earlier point even just "fusing" a NOP with the following
instruction).

Even if every compiler did "the right thing" to provide target
alignment, clever programmers could include assembly to do "the
clever thing". A clever programmer might reason that a NOP
increases instruction count and therefore is harmful to
performance. Also there may be a greater fear that some other
programmer would remove a NOP as useless when a useless prefix
might not be recognized as useless.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to EricP on Sat Nov 18 10:32:15 2023

EricP wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding
instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

Intel Instruction manual vol2 section 2.1:

"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior"

These prefix rules were added relatively recently (maybe last 10 years?). While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.

So the 'REP REP MOVS' and 'repz retq' have been clearly documented
as unpredictable for a long time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Nov 18 16:36:25 2023

According to EricP <[email protected]>:

These prefix rules were added relatively recently (maybe last 10 years?).
While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.

I have the October 1979 8086 Family User's Manual here. (The actual
paper one, not a scan.)

In the discussion of repeat prefixes, it says they're interruptible,
and if a second or third segment or lock prefix is present it won't
work because it only remembers one prefix for the interrupt. You can
turn off interrupts, but an NMI might still break stuff.

The only plausble two prefix instruction I can think of is an exchange
with a segment override:

LOCK XCHG ES:FOO,AX

The assembler will generate the lock prefix first. I doubt they gave
much thought to what would happen if the prefixes were in the other
order.

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to EricP on Sat Nov 18 16:21:46 2023

EricP <[email protected]> writes:

Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.

So the 'REP REP MOVS' and 'repz retq' have been clearly documented
as unpredictable for a long time.

Fortunately, unlike "undefined behaviour" advocates and others who
point to documentation, Intel is aware of Hyrum's law and does not do "unpredictable behaviour" on any instruction sequence on purpose.
Consequently, they treated the REX MOVSB issue as a bug that they
should fix. However, in this case probably not because they expected
to see such code in the wild (AFAIK it was only found by fuzzing), but
because it allows priviledge escalation.

In particular, if they now implemented a CPU where "repz retq" did
something different than "retq", that would mean that a lot of
binaries would no longer work, and no amount of pointing to
documentation from 2001 or from 1978 to 2023 would stop the reputation
damage that would ensue. That's because compilers (and probably also
assembly language programmers) actually followed other documentation
that recommended using repz retq (see
<https://repzret.org/p/repzret/>).

In this case, one interesting aspect is that the K8 is the first AMD64
CPU, years before any Intel CPU could be bought that would be
compatible with this instruction set. So by the time Intel brought
out their AMD64-compatible CPU (although they had their own names for
the architecture: IA32e, then EM64T, finally Intel64), there were a
lot of binaries with repz retq around, and if Intel wanted to sell
those CPUs into the 64-bit market, they had better support these
binaries, and the 2001 documentation for a different architecture did
not matter in any case.

If you or Intel want to reserve some encoding space, the way to do it
is to either trap on the encoding, or treat it as noop. The noops are
for encodings that you later want to define as hints, because hints architecturally are noops.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to EricP on Sat Nov 18 16:21:24 2023

EricP <[email protected]> writes:

EricP wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding
instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

Intel Instruction manual vol2 section 2.1:

"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior" >>
These prefix rules were added relatively recently (maybe last 10 years?).
While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.

The iAPX 86,88 manual from 1981 states when discussing REP/REPE/REPNE
in the context of interrupts:

"The processor 'remembers' only one prefix in effect
at the time of the interrupt, the prefix that immediately precedes
the string instruction."

Which implies that segment overrides in conjunction with a repeat
prefix won't be preserved if the MOVS is interrupted (they suggest
CLI/STI during string operations with segment override(s), noting
that won't help if an NMI occurs).

I could find no text describing any other restrictions on prefix
bytes. For that matter, while there were references to segment
override prefixes, they weren't actually enumerated in the data sheet.

The instruction set reference data is interesting with respect
to the clock counts for each instruction. A 16-bit integer
multiply, for example, took between 128 and 154 clocks when
using register operands.

Conditional branches were 16 or 4 clocks (presumably taken vs.
not-taken).

So the 'REP REP MOVS' and 'repz retq' have been clearly documented
as unpredictable for a long time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Levine on Sat Nov 18 17:48:10 2023

John Levine <[email protected]> writes:

The only plausble two prefix instruction I can think of is an exchange
with a segment override:

LOCK XCHG ES:FOO,AX

When looking for REPZ, I found <https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz>, and it
lists, e.g.,

F3 REX.W A4

F3 is the REP prefix. This instruction is a REP MOVSB, and the REX prefix seems redundant to me in this case. I don't know if that's the one
the OP was about, though.

It also lists

F3 REX.W A5

That's REP MOVSQ, and the REX prefix is not redundant here. But
that's AMD64.

However, the page also lists

F3 A5

which is either REP MOVSW or REP MOVSD (REP MOVSL for AT&T syntax),
depending on mode. But there is also the 66/67 prefix for switching
to the other mode. E.g., of the mode is 32-bit addresses and 32-bit
data, and you want a REP MOVSW that uses 16-bit address registers,
maybe you would do something like

66 67 F3 A5

But that's IA-32; for 8086 I indeed cannot think of other prefixes.

For MOVS, a segment override prefix overrides the implicit DS: of the
source operand; the segment of the destination (implicitly ES:) cannot
be overridden. The page on REP above says nothing about segment
override limitations, so I expect that this limitation was dropped in
the 386 and later processors (probably already in the 286, where the
idea was to use segment registers (and overrides) a lot).

The assembler will generate the lock prefix first. I doubt they gave
much thought to what would happen if the prefixes were in the other
order.

For the original decoder, each prefix probably set a bit, and the
order did not matter. Therefore later implementations had to accept
all orders.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Nov 18 22:07:23 2023

According to Anton Ertl <[email protected]>:

For MOVS, a segment override prefix overrides the implicit DS: of the
source operand; the segment of the destination (implicitly ES:) cannot
be overridden. The page on REP above says nothing about segment
override limitations, so I expect that this limitation was dropped in
the 386 and later processors (probably already in the 286, where the
idea was to use segment registers (and overrides) a lot).

I looked at my 1985 i286 manual. The LOCK prefix waa fairly useless
since XCHG now always locks, so it only affected MOVS, INS, and OUTS,
I guess for unaligned word transfers. They describe REP MOVS and say
that segment overrides work for the source address, no warning about interrupts.

Appendix D on compatibility with the 86/88 has cryptic advice not to
use duplicate prefixes because the 286 has a maximum instruction
length of 10 bytes, while the 86/88 had no limit.

So I guess you're right about the 86/88 prefixes setting a flag bit
and otherwise being forgotten, but the 286 remembered the whole
instruction with the prefixes so long as it wasn't excesively long.

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Anton Ertl on Sat Nov 18 23:41:00 2023

Anton Ertl wrote:

If you or Intel want to reserve some encoding space, the way to do it
is to either trap on the encoding, or treat it as noop. The noops are
for encodings that you later want to define as hints, because hints architecturally are noops.

No, for future compatibility, you can only raise exceptions on unrecognized
bit patterns--otherwise you add future undefined behavior to your architecture. Taking unrecognized things as NoOps is a sure way to shoot yourself in the foot with a very slow and very painful bullet.

See your own trailer::
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

- anton

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to John Levine on Sat Nov 18 16:20:30 2023

On 11/18/2023 8:36 AM, John Levine wrote:

According to EricP <[email protected]>:

These prefix rules were added relatively recently (maybe last 10 years?). >>> While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.

I have the October 1979 8086 Family User's Manual here. (The actual
paper one, not a scan.)

In the discussion of repeat prefixes, it says they're interruptible,
and if a second or third segment or lock prefix is present it won't
work because it only remembers one prefix for the interrupt. You can
turn off interrupts, but an NMI might still break stuff.

The only plausble two prefix instruction I can think of is an exchange
with a segment override:

LOCK XCHG ES:FOO,AX

Funny aspect, XCHG has an implied LOCK prefix... :^)

The assembler will generate the lock prefix first. I doubt they gave
much thought to what would happen if the prefixes were in the other
order.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to MitchAlsup on Sun Nov 19 13:35:08 2023

[email protected] (MitchAlsup) writes:

Anton Ertl wrote:

If you or Intel want to reserve some encoding space, the way to do it
is to either trap on the encoding, or treat it as noop. The noops are
for encodings that you later want to define as hints, because hints
architecturally are noops.

No, for future compatibility, you can only raise exceptions on unrecognized >bit patterns--otherwise you add future undefined behavior to your architecture.

What behaviour is undefined by a noop (which is what a hint is architecturally)?

Taking unrecognized things as NoOps is a sure way to shoot yourself in the foot
with a very slow and very painful bullet.

They are recognized as noops, and microarchitecturally have no
specific performance impact. In the future they will continue to be
noops, but they may influence the performance by providing
microarchitectural hints. True, if somebody uses these noops instead
of the recommended ones, in the future their application may suffer a
slowdown, but the application will work correctly.

The alternative is to add a previously trapping bit pattern as a hint.
The result will be that the hint will not be used for at least a
decade, because nobody wants their application to die if it is run
on hardware of the previous generation.

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

Is there anything I forgot?

Searching for "hint" in <https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf>,
they use the register numbers of the JALR (indirect call) instruction
for giving a hint on whether and how to use the return-address stack
(x1 and x5 are used for calls, returns or coroutine calls).

That's the only hints that the instruction set specification defines.

It also defines that a number of compressed encodings that do not
change architectural state are noops that may become hints in the
future, e.g. C.ADDI with an immediate value of 0.

Interestingly, they did not define such noops as possible future hints
for the uncompressed instruction set. I guess they expected any
implementation that implements hint instructions to also implement the compressed extension, but given that big implementations tend to not
need hints (see above), while smaller ones may benefit from them, I
wonder whether this is really such a good idea.

BTW, thanks for producing a much more readable posting than what you
used to produce with G2 (Google Groups). Rocksolid Light (used by
NovaBBS) seems to be good for your readability.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Scott Lurndal on Sun Nov 19 22:56:50 2023

Scott Lurndal wrote:

EricP <[email protected]> writes:

EricP wrote:

Scott Lurndal wrote:

[email protected] (MitchAlsup) writes:

<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding
instructions
following label boundaries.

In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.

58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

58eaf0: f3 c3 repz retq

Intel Instruction manual vol2 section 2.1:

"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior" >>>
These prefix rules were added relatively recently (maybe last 10 years?). >>> While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.

Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.

The iAPX 86,88 manual from 1981 states when discussing REP/REPE/REPNE
in the context of interrupts:

"The processor 'remembers' only one prefix in effect
at the time of the interrupt, the prefix that immediately precedes
the string instruction."

Which implies that segment overrides in conjunction with a repeat
prefix won't be preserved if the MOVS is interrupted (they suggest
CLI/STI during string operations with segment override(s), noting
that won't help if an NMI occurs).

I have written code to detect/test for this particular issue:

I started a big REP SEGES MOVSB, with the prefix bytes in that order,
and the repeat count in CX large enough that it would take more than 55
ms to execute. This was long enough that a timer tick interrupt was
guaranteed, so I could test the remaining CX value (JCXNZ) to check if I
was running on a CPU which disallowed multiple prefix bytes,

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	47:43:53
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,114

Redundant prefixes break fsrm in Ice Lake

Who's Online

Recent Visitors

System Info