I thought this might interest some posters here, I wrote up a bug we discovered in the fast short repeat move feature added in Ice Lake.<
The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
Tavis.
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we
discovered in the fast short repeat move feature added in Ice Lake.
The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html<
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the programmer >expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we<
discovered in the fast short repeat move feature added in Ice Lake.
The quick summary is that adding a redundant rex.r prefix to movsb seems >>> to cause ROB entries to be associated with incorrect addresses. I have
no special insight into what the microcode is doing, maybe some reader
here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the programmer >>expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we<
discovered in the fast short repeat move feature added in Ice Lake.
The quick summary is that adding a redundant rex.r prefix to movsb
seems
to cause ROB entries to be associated with incorrect addresses. I have >>>> no special insight into what the microcode is doing, maybe some reader >>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we<
discovered in the fast short repeat move feature added in Ice Lake.
The quick summary is that adding a redundant rex.r prefix to movsb seems >>>> to cause ROB entries to be associated with incorrect addresses. I have >>>> no special insight into what the microcode is doing, maybe some reader >>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor, >>>instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container >>>then execution is delivered a different pattern of bits than the programmer >>>expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
On 11/15/2023 2:57 PM, MitchAlsup wrote:<
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we >>>>> discovered in the fast short repeat move feature added in Ice Lake.<
The quick summary is that adding a redundant rex.r prefix to movsb
seems
to cause ROB entries to be associated with incorrect addresses. I have >>>>> no special insight into what the microcode is doing, maybe some reader >>>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
I think it is semi-common to align function entry points and some labels
and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...
I think the idea here is that aligning a function entry points can potentially make the function calls slightly faster due to "cache magic"<
or similar. Also INT3 crashes the program if it tries to branch into
this padding space.
But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.
And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...
[email protected] (MitchAlsup) writes:<
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we >>>>> discovered in the fast short repeat move feature added in Ice Lake.<
The quick summary is that adding a redundant rex.r prefix to movsb seems >>>>> to cause ROB entries to be associated with incorrect addresses. I have >>>>> no special insight into what the microcode is doing, maybe some reader >>>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes >>>>on decoding. So, if you have multiple prefixes of the same flavor, >>>>instead of latching only the last (or first) prefix data, but instead >>>>ORs all the prefix data of a "kind" of prefix into a prefix container >>>>then execution is delivered a different pattern of bits than the programmer >>>>expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
I refer you to the Intel Architecture Software Optimization Guide.
Specifically:
"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch
targets should be 16-byte aligned."
BGB wrote:
On 11/15/2023 2:57 PM, MitchAlsup wrote:
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we >>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>><
The quick summary is that adding a redundant rex.r prefix to movsb >>>>>> seems
to cause ROB entries to be associated with incorrect addresses. I
have
no special insight into what the microcode is doing, maybe some
reader
here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes >>>>> on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead >>>>> ORs all the prefix data of a "kind" of prefix into a prefix container >>>>> then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
I think it is semi-common to align function entry points and some<
labels and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...
Yes, this is common (and useful)
<
How many functions start off with REP REP REP MOVS ??
<
I think the idea here is that aligning a function entry points can<
potentially make the function calls slightly faster due to "cache
magic" or similar. Also INT3 crashes the program if it tries to branch
into this padding space.
But REP REP REP MOVS never occurs at the entry point of a function !!
<
But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.
And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we >>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>><
The quick summary is that adding a redundant rex.r prefix to movsb seems >>>>>> to cause ROB entries to be associated with incorrect addresses. I have >>>>>> no special insight into what the microcode is doing, maybe some reader >>>>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined. >>>>>When the decoder encounters a prefix, it latches prefix data and goes >>>>>on decoding. So, if you have multiple prefixes of the same flavor, >>>>>instead of latching only the last (or first) prefix data, but instead >>>>>ORs all the prefix data of a "kind" of prefix into a prefix container >>>>>then execution is delivered a different pattern of bits than the programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
I refer you to the Intel Architecture Software Optimization Guide.
Specifically:
"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch<
targets should be 16-byte aligned."
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
On 11/15/2023 5:34 PM, MitchAlsup wrote:
BGB wrote:
On 11/15/2023 2:57 PM, MitchAlsup wrote:<
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:The compiler people use multiple prefixes to align code.
I thought this might interest some posters here, I wrote up a bug we >>>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>>><
The quick summary is that adding a redundant rex.r prefix to movsb >>>>>>> seems
to cause ROB entries to be associated with incorrect addresses. I >>>>>>> have
no special insight into what the microcode is doing, maybe some
reader
here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined. >>>>>> When the decoder encounters a prefix, it latches prefix data and goes >>>>>> on decoding. So, if you have multiple prefixes of the same flavor, >>>>>> instead of latching only the last (or first) prefix data, but instead >>>>>> ORs all the prefix data of a "kind" of prefix into a prefix container >>>>>> then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ?? >>>>
The code is already byte aligned, what more is necessary ??
I think it is semi-common to align function entry points and some
labels and similar, but IME this was usually done with NOP or "INT 3"
instructions or similar...
Yes, this is common (and useful)
<
How many functions start off with REP REP REP MOVS ??
<
I think the idea here is that aligning a function entry points can<
potentially make the function calls slightly faster due to "cache
magic" or similar. Also INT3 crashes the program if it tries to branch
into this padding space.
But REP REP REP MOVS never occurs at the entry point of a function !!
<
Granted, yes, I have not seen this one.
IME, it is usually something like:
...; INT3; INT3; INT3; PUSH RBP; MOV RBP, RSP; ...
Or similar...
And, at the end of a function:
RET; NOP; NOP; ...; INT3; INT3; ...
With any label-alignment via one of the multi-byte NOP encodings.
But, at least, much beyond this, it is unclear how alignment would be
needed or beneficial on x86 or x86-64.
And, to this end (if one needs inline padding), using one of the
multi-byte NOP sequences seems less likely to invoke weird/undefined
behavior than trying to do something weird with opcode prefixes...
[email protected] (MitchAlsup) writes:
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:<
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we >>>>>>> discovered in the fast short repeat move feature added in Ice Lake. >>>>>><
The quick summary is that adding a redundant rex.r prefix to movsb seems
to cause ROB entries to be associated with incorrect addresses. I have >>>>>>> no special insight into what the microcode is doing, maybe some reader >>>>>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined. >>>>>>When the decoder encounters a prefix, it latches prefix data and goes >>>>>>on decoding. So, if you have multiple prefixes of the same flavor, >>>>>>instead of latching only the last (or first) prefix data, but instead >>>>>>ORs all the prefix data of a "kind" of prefix into a prefix container >>>>>>then execution is delivered a different pattern of bits than the programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
I refer you to the Intel Architecture Software Optimization Guide.
Specifically:
"Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch >>> targets should be 16-byte aligned."
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.
In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.
58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
58eaf0: f3 c3 repz retq
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
Tavis Ormandy wrote:
I thought this might interest some posters here, I wrote up a bug we<
discovered in the fast short repeat move feature added in Ice Lake.
The quick summary is that adding a redundant rex.r prefix to movsb
seems
to cause ROB entries to be associated with incorrect addresses. I have >>>> no special insight into what the microcode is doing, maybe some reader >>>> here can read between the lines and explain what is going on :)
https://lock.cmpxchg8b.com/reptar.html
My GUESS has to do with how instruction-boundaries are determined.
When the decoder encounters a prefix, it latches prefix data and goes
on decoding. So, if you have multiple prefixes of the same flavor,
instead of latching only the last (or first) prefix data, but instead
ORs all the prefix data of a "kind" of prefix into a prefix container
then execution is delivered a different pattern of bits than the
programmer
expected.
<
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
The code is already byte aligned, what more is necessary ??
According to Scott Lurndal <[email protected]>:
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
What? Why wouldn't you use a NOP? The Intel manual has a list
of NOPs with sizes from one byte to nine.
[email protected] (MitchAlsup) writes:
<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.
In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.
58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
58eaf0: f3 c3 repz retq
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding instructions
following label boundaries.
In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.
58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
58eaf0: f3 c3 repz retq
Intel Instruction manual vol2 section 2.1:
"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior"
These prefix rules were added relatively recently (maybe last 10 years?). While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.
According to Scott Lurndal <[email protected]>:
But who ever decided multiple prefixes of the same kind are LEGAL ??
The compiler people use multiple prefixes to align code.
What? Why wouldn't you use a NOP? The Intel manual has a list
of NOPs with sizes from one byte to nine.
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding
instructions
following label boundaries.
In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.
58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
58eaf0: f3 c3 repz retq
Intel Instruction manual vol2 section 2.1:
"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior"
These prefix rules were added relatively recently (maybe last 10 years?). While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.
These prefix rules were added relatively recently (maybe last 10 years?).
While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.
Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.
So the 'REP REP MOVS' and 'repz retq' have been clearly documented
as unpredictable for a long time.
EricP wrote:
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding
instructions
following label boundaries.
In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.
58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
58eaf0: f3 c3 repz retq
Intel Instruction manual vol2 section 2.1:
"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior" >>
These prefix rules were added relatively recently (maybe last 10 years?).
While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.
Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.
So the 'REP REP MOVS' and 'repz retq' have been clearly documented
as unpredictable for a long time.
The only plausble two prefix instruction I can think of is an exchange
with a segment override:
LOCK XCHG ES:FOO,AX
The assembler will generate the lock prefix first. I doubt they gave
much thought to what would happen if the prefixes were in the other
order.
For MOVS, a segment override prefix overrides the implicit DS: of the
source operand; the segment of the destination (implicitly ES:) cannot
be overridden. The page on REP above says nothing about segment
override limitations, so I expect that this limitation was dropped in
the 386 and later processors (probably already in the 286, where the
idea was to use segment registers (and overrides) a lot).
If you or Intel want to reserve some encoding space, the way to do it
is to either trap on the encoding, or treat it as noop. The noops are
for encodings that you later want to define as hints, because hints architecturally are noops.
- anton
According to EricP <[email protected]>:
These prefix rules were added relatively recently (maybe last 10 years?). >>> While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.
Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
I have the October 1979 8086 Family User's Manual here. (The actual
paper one, not a scan.)
In the discussion of repeat prefixes, it says they're interruptible,
and if a second or third segment or lock prefix is present it won't
work because it only remembers one prefix for the interrupt. You can
turn off interrupts, but an NMI might still break stuff.
The only plausble two prefix instruction I can think of is an exchange
with a segment override:
LOCK XCHG ES:FOO,AX
The assembler will generate the lock prefix first. I doubt they gave
much thought to what would happen if the prefixes were in the other
order.
Anton Ertl wrote:
If you or Intel want to reserve some encoding space, the way to do it
is to either trap on the encoding, or treat it as noop. The noops are
for encodings that you later want to define as hints, because hints
architecturally are noops.
No, for future compatibility, you can only raise exceptions on unrecognized >bit patterns--otherwise you add future undefined behavior to your architecture.
Taking unrecognized things as NoOps is a sure way to shoot yourself in the foot
with a very slow and very painful bullet.
EricP <[email protected]> writes:
EricP wrote:
Scott Lurndal wrote:
[email protected] (MitchAlsup) writes:
<
How many branch targets have REP REP REP MOVS at the label??
<
You see, these REP REP REP MOVS's almost invariably have preceding
instructions
following label boundaries.
In looking at a fairly recent ELF binary, mostly I see various length
nops, and a bunch of 'repz retq' sequences.
58eae6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
58eaf0: f3 c3 repz retq
Intel Instruction manual vol2 section 2.1:
"Use of repeat prefixes and/or undefined opcodes with other Intel 64 or
IA-32 instructions is reserved; such use may cause unpredictable behavior" >>>
These prefix rules were added relatively recently (maybe last 10 years?). >>> While they only allow one prefix from each of Group 1..4,
they still allow prefix bytes to be in any order thereby wasting
much opcode space on redundant premutations and combinations.
Actually the prefix rules go back farther - they are present in
an Intel x86 instruction manual from 2001 I had on a backup.
Older backups are not readily accessible.
The iAPX 86,88 manual from 1981 states when discussing REP/REPE/REPNE
in the context of interrupts:
"The processor 'remembers' only one prefix in effect
at the time of the interrupt, the prefix that immediately precedes
the string instruction."
Which implies that segment overrides in conjunction with a repeat
prefix won't be preserved if the MOVS is interrupted (they suggest
CLI/STI during string operations with segment override(s), noting
that won't help if an NMI occurs).
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 42:50:25 |
| Calls: | 12,110 |
| Calls today: | 1 |
| Files: | 15,008 |
| Messages: | 6,518,433 |