Forum: >>> Magnum BBS <<<

Privilege Levels Below User

From John Savard@21:1/5 to All on Fri Jun 7 12:03:03 2024

This may be a silly idea... but it seems to be the sort of thing that
current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the
computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the existence of this feature.

Some types of instruction that are required for normal computation are
still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return
- the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

The first reduced-privilege state would not allow any branch
instructions, particularly conditional branches.

The second, in addition, would not allow any access to memory, only
allowing access to registers.

To use these states to aid in security, more is required.

For one thing, blocks of memory would need to be able to be marked as
not only containing code or data, but as containing code that runs at
one of these reduced privilege levels.

And then comes the payaoff: a block of memory could be marked as
writeable, but yet containing executable code, for things like
just-in-time compilation... but as only containing code at one of
these reduced privilege levels. Thus preventing the generation of code containing branches or memory accesses, as desired, while allowing the generation of computational sequences.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to John Savard on Fri Jun 7 18:18:33 2024

John Savard <[email protected]d> writes:

This may be a silly idea... but it seems to be the sort of thing that
current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the >computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the >existence of this feature.

Some types of instruction that are required for normal computation are
still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return
- the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave, SMM AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM
ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1 (Kernel), EL0 (user)

<snip description of useless feature>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Fri Jun 7 20:40:34 2024

John Savard wrote:

This may be a silly idea... but it seems to be the sort of thing that
current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the existence of this feature.

Most of the miscreations have to do with allowing microarchitectural
state to be come visible through a high precision timing mechanism,
not with the skirting of privilege.

Some types of instruction that are required for normal computation are
still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return
- the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

In My 66000, the Monitor, Hypervisor, Supervisor, and guest can
share the dynamic libraries contining no privileged instructions.
And since there is only 1 such instruction it is easy to check.

However, a Pthread can transfer control to another Pthread without
privilege in a single instruction.

The first reduced-privilege state would not allow any branch
instructions, particularly conditional branches.

Are My 66000 predication shadows considered "branching" since they
do not alter where the Fetch end of the pipeline is working??

Are My 66000 Switch instructions considered branches ?? since the
transfer table is in .text and relative to the current switch
instruction?

Are Supervisor Calls "brnches" since they go to controlled entry
points??

Are Supervisor Returns "branches" since they to to controlled
return points ??

The second, in addition, would not allow any access to memory, only
allowing access to registers.

To use these states to aid in security, more is required.

For one thing, blocks of memory would need to be able to be marked as
not only containing code or data, but as containing code that runs at
one of these reduced privilege levels.

How are you going to perform elementary functions {SIN, COS, EXP, LOG}?

And then comes the payaoff: a block of memory could be marked as
writeable, but yet containing executable code, for things like
just-in-time compilation...

A C compiler is an application running in a different process. Why
is a JIT "not like that" ??

but as only containing code at one of

these reduced privilege levels. Thus preventing the generation of code containing branches or memory accesses, as desired, while allowing the generation of computational sequences.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Scott Lurndal on Fri Jun 7 20:43:23 2024

Scott Lurndal wrote:

John Savard <[email protected]d> writes:

This may be a silly idea... but it seems to be the sort of thing that >>current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the >>computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the >>existence of this feature.

Some types of instruction that are required for normal computation are >>still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return
- the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave,

SMM

I count 5 (unused privilege levels are not real privilege levels)

AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM

I count 4

ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1
(Kernel), EL0 (user)

I count 5

<snip description of useless feature>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Savard on Sat Jun 8 00:06:41 2024

On Fri, 07 Jun 2024 12:03:03 -0600, John Savard wrote:

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return -
the return instruction being limited, sort of like a supervisor call, so
it can only return in a proper manner.

MULTICS lives!

That was the next-generation kitchen-sink OS from the latter 1960s that
was taking so long to develop, Bell Labs pulled out of the project and set about creating their own, much less ambitious OS instead, which they
initially called “UNICS” (to indicate it was the opposite of “MULTICS”).

MULTICS required hardware with 8 different privilege levels (rings), from
0 (most privileged) to 7 (least privileged).

User code normally ran at ring 4. This left 5, 6 and 7 available for
ordinary users to impose their own additional isolation on code they
didn’t quite trust.

Another option, less of a hierarchy and more of a privilege matrix, would
be to use capabilities. I think I mentioned CHERI in this newsgroup
previously.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to D'Oliveiro on Sat Jun 8 10:06:00 2024

In article <v407ah$29fla$[email protected]>, [email protected]d (Lawrence
D'Oliveiro) wrote:

On Fri, 07 Jun 2024 12:03:03 -0600, John Savard wrote:

So I am thinking it might be useful to have, for example, two
states less privileged than the user state, and some mechanism
for user programs to call subroutines which are in that state
until they return - the return instruction being limited, sort
of like a supervisor call, so it can only return in a proper
manner.

As a practical matter, ISA features requiring assembly coding are not accessible to application programmers these days, because they mostly
don't know assembler and are scared by the idea. They also don't want to
know about privilege levels. Such features are accessible to compiler
writers, JIT creators and OS creators, but introducing additional
complexity into their work is not welcome.

If this feature existed and worked, it would hardly be used at all.

User code normally ran at ring 4. This left 5, 6 and 7 available
for ordinary users to impose their own additional isolation on code
they didn't quite trust.

Was this used? People were much more willing to do low-level programming
in those days, but I bet this went unused.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Jun 8 10:17:36 2024

According to Lawrence D'Oliveiro <[email protected]d>:

That was the next-generation kitchen-sink OS from the latter 1960s that
was taking so long to develop, Bell Labs pulled out of the project and set >about creating their own, much less ambitious OS instead, which they >initially called “UNICS” (to indicate it was the opposite of “MULTICS”).

Bell Labs did indeed give up on Multics, but Unix was an unofficial
skunkworks project and the name is a joke, a castrated Multics. This
is well documented in many Unix history papers.

On the other hand, if Multics hadn't been so late, and so closely tied
to expensive hardware that wasn't byte addressable and was already
running out of address bits, who knows how much of its other features
might have been more widely adopted. It was pretty cool that you could
write your code one piece at a time, and if you called a routine that
didn't exist, it would stop, you could write and compile the routine,
and then continue the original program.

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Scott Lurndal on Sat Jun 8 12:01:56 2024

Scott Lurndal wrote:

John Savard <[email protected]d> writes:

This may be a silly idea... but it seems to be the sort of thing that
current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the
computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the
existence of this feature.

Some types of instruction that are required for normal computation are
still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return
- the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave, SMM AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM
ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1 (Kernel), EL0 (user)

VAX had 4 modes, User, Supervisor, Executive, Kernel.
VMS used Super for debugger and the command language DCL,
Exec was mostly for the file system.
Kernel was for the core of the OS.

What they found that not only do they not need 4 levels,
it was a pointless overhead to have to constantly switch between them.
(There is a pretty high penalty to switching modes, copying in args,
validating args, doing something usually simple, then switching back,
when it is all the OS's code anyway.)

I don't know what privileges Unix on VAX used but it was
probably 2 levels because PDP-11 had only 2 levels.

Alpha had 3 levels, User, Supervisor, and a higher third mode called
PAL for Privileged Architecture Library. It was supposed to be thought
of like microcode, privileged subroutines. Then PAL mode was used to
emulate the 4 levels that VMS expected when they ported it.

(I think PAL mode was a way to patent a feature that made the
ISA impossible to copy without their permission,
and therefore someone can't take DEC's executables and run them
on a clone processor, like what happened to IBM with Amdahl.)

WinNT was written to be portable so the lowest common denominator
is 2 levels, User and Super, and everything worked just fine.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Sat Jun 8 17:37:46 2024

EricP wrote:

Scott Lurndal wrote:

John Savard <[email protected]d> writes:

This may be a silly idea... but it seems to be the sort of thing that
current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the
computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the
existence of this feature.

Some types of instruction that are required for normal computation are
still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return
- the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave,

SMM
AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM
ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1
(Kernel), EL0 (user)

VAX had 4 modes, User, Supervisor, Executive, Kernel.
VMS used Super for debugger and the command language DCL,
Exec was mostly for the file system.
Kernel was for the core of the OS.

What they found that not only do they not need 4 levels,
it was a pointless overhead to have to constantly switch between them.
(There is a pretty high penalty to switching modes, copying in args, validating args, doing something usually simple, then switching back,
when it is all the OS's code anyway.)

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

But for similar reasons ring 1 and 2 are not used in x86 machines,
either. {{NOw, if we could just go back to 1982 and not invent
IDTs, and call gates, .....}}

I don't know what privileges Unix on VAX used but it was
probably 2 levels because PDP-11 had only 2 levels.

Alpha had 3 levels, User, Supervisor, and a higher third mode called
PAL for Privileged Architecture Library. It was supposed to be thought
of like microcode, privileged subroutines. Then PAL mode was used to
emulate the 4 levels that VMS expected when they ported it.

PAL was microcode in <fast> ROM in the native ISA.

(I think PAL mode was a way to patent a feature that made the
ISA impossible to copy without their permission,
and therefore someone can't take DEC's executables and run them
on a clone processor, like what happened to IBM with Amdahl.)

Worked real well for them !!

WinNT was written to be portable so the lowest common denominator
is 2 levels, User and Super, and everything worked just fine.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Sun Jun 9 00:21:01 2024

On Sat, 8 Jun 2024 10:17:36 -0000 (UTC), John Levine wrote:

On the other hand, if Multics hadn't been so late, and so closely tied
to expensive hardware that wasn't byte addressable and was already
running out of address bits, who knows how much of its other features
might have been more widely adopted.

Multics did finally reach production, and was still available for purchase
into the 1980s. There is a brochure from Honeywell (who took over the GE computer business), dated 1982, that touts its features. And considering
it had been something like 15 years since development commenced at that
point, it doesn’t look dated at all. What other platforms from that time
had integrated graphics, text processing and database support?

I heard somewhere that Honeywell had no idea what to do with this MULTICS thing. They priced it at something obscene (seven figures) to put off
people buying it, but some customers stayed loyal, regardless.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to BGB on Sun Jun 9 00:29:42 2024

BGB wrote:

On 6/8/2024 11:01 AM, EricP wrote:

Scott Lurndal wrote:

John Savard <[email protected]d> writes:

Though, the time returned by the CPUID microsecond timer is not
currently the same as the one given by "TK_GetTimeUS()", where the
latter effectively gives a 64-bit value (conceptually) representing the

number of microseconds since 1/1/1970; though with the kernel currently

assuming that its build-time is the starting time for the clock (and
none of the FPGA boards support a hardware clock, and one would need
internet access to use NTP, ...).

A 64-bit value in microseconds can express around +/- 300k years, which

should be plenty.

What do you do when you need a 200 picosecond timer ?? (5GHz cycle
counter)

A 64-bit value expressed in seconds could express values relative to
the
current age of the universe, but this is likely unnecessary for most purposes, and ability to express fractions of a second is likely more
useful than the ability to express the age of the universe.

Interesting factoid::
The universe is currently 10^80 Plank times old since Big Bang,
and universe will die around 10^80 years,
and there are about 10^80-10^88 particles in the universe.

Granted, one could use a 128-bit value, and have both (and in
picoseconds if they wanted). But, this would be overkill.

Or, go extra overkill, and use 256 bits, to express the current age of
the universe in Planck units...

160-bits will be shown to be sufficient to count Plank times.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Sun Jun 9 02:23:35 2024

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, theoretical concept rather than a practical one.

The question was: was the instruction set defined so that code that was designed to run in a privileged mode be run unprivileged, so that any
attempt to do privileged things would be trapped and emulated by the real privileged code? And there was nothing it could do to discover it wasn’t running in privileged mode?

(Obviously performance was not the issue here, but correctness was.)

For example, the VAX had a MOVPSL instruction that allowed read-only
access to the entire processor status register. Through this,
nonprivileged user-mode code could discover it was running in user mode,
which would blow the illusion.

The Motorola 680x0 family was I think properly virtualizable in this
sense. Or maybe the 68020 and 68030 were, but the 68040 was. I think the Motorola engineers working on the ’040 asked if any customers were
interested in preserving the self-virtualization feature, and nobody
seemed to care.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to [email protected] on Sun Jun 9 12:25:44 2024

[email protected] (MitchAlsup1) writes:

EricP wrote:

Alpha had 3 levels, User, Supervisor, and a higher third mode called
PAL for Privileged Architecture Library. It was supposed to be thought
of like microcode, privileged subroutines. Then PAL mode was used to
emulate the 4 levels that VMS expected when they ported it.

PAL was microcode in <fast> ROM in the native ISA.

What is called when you perform a PAL call (at least on EV45, but most
likely on all Alphas) is Alpha code, and it resides in RAM and is
loaded there from the boot loader. I know, because I enhanced the PAL
code supplied with the MILO boot loader for EV45 to activate the full
16KB of D-cache (rather than just 8KB).

It also uses less specials than I expected; e.g., on the EV45 the IMB (instruction-memory barrier) PAL call is implemented by just executing
a big chunk of code such that the previous contents of the I-cache are
evicted, while I expected that it would set a bit in a model-specific
register.

(I think PAL mode was a way to patent a feature that made the
ISA impossible to copy without their permission,
and therefore someone can't take DEC's executables and run them
on a clone processor, like what happened to IBM with Amdahl.)

Worked real well for them !!

Definitely. Note that the first Amdahl machine shipped 11 years after
the first S/360. Alpha was canceled 9 years after the first Alpha was
shipped.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Schultz@21:1/5 to Lawrence D'Oliveiro on Sun Jun 9 08:34:29 2024

On 6/8/24 9:23 PM, Lawrence D'Oliveiro wrote:

The Motorola 680x0 family was I think properly virtualizable in this
sense. Or maybe the 68020 and 68030 were, but the 68040 was. I think the Motorola engineers working on the ’040 asked if any customers were interested in preserving the self-virtualization feature, and nobody
seemed to care.

The 68010 made the move from SR instruction privileged.

CP/M-68K V1.2 added support for the 68010. The exception handler would
patch the code to change a move from SR into a move from CCR.

--
http://davesrocketworks.com
David Schultz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Sun Jun 9 14:13:25 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, >theoretical concept rather than a practical one.

I'm quite sure that IBM would disagree with this statement.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to All on Sun Jun 9 09:14:11 2024

On Fri, 7 Jun 2024 20:40:34 +0000, [email protected] (MitchAlsup1)
wrote:

Are Supervisor Calls "brnches" since they go to controlled entry
points??

Well, they're a kind of subroutine call. But they're really
instructions that initiate the computer's response to an interrupt,
which is what makes the entry point controlled and the instruction
able to increase privilege.

How are you going to perform elementary functions {SIN, COS, EXP, LOG}?

Just because the feature exists doesn't mean it needs to be used for everything. Ordinary subroutine calls will still exist, so if these
routines require scratchpad memory, that will be fine.

A C compiler is an application running in a different process. Why
is a JIT "not like that" ??

A C compiler doesn't save data in memory that can then be executed. It
writes to a file. The linking loader, instead, is "like" a JIT
compiler in that respect.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to [email protected] on Sun Jun 9 09:26:42 2024

On Fri, 07 Jun 2024 12:03:03 -0600, John Savard
<[email protected]d> wrote:

The first reduced-privilege state would not allow any branch
instructions, particularly conditional branches.

The second, in addition, would not allow any access to memory, only
allowing access to registers.

Maybe I haven't made clear what this is _for_ as I thought it would be
obvious.

If no branches... then no need for retpolines and stuff.

If no access to memory... no worries about rowhammer.

Given that, a third mode - not reduced-privilege so much as
reduced-efficiency - suggests itself.

Cause some code to be executed... without any speculative execution;
allow branches, but don't execute anything until where the branch goes
is fully resolved.

This deals with Spectre and friends.

So the idea is to give an unprivileged user application, like a web
browser, a capability, without going through the operating system, to
run code that is sandboxed in appropriate ways to prevent it from
causing trouble although it is untrusted.

That browsers have to be able to run untrusted JavaScript (and,
formerly, even Java and Flash, which have now been discarded) to
support the flexibility desired for modern web sites... has been the
basic reason why computers today are insecure. If the only code that
ran on computers was trusted code, then the virus situation would be
like it was back in the days of 8-bit computers; except for
supply-chain attacks, just don't run pirated software, and you're
pretty much safe.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to All on Sun Jun 9 09:16:44 2024

On Fri, 07 Jun 2024 18:18:33 GMT, [email protected] (Scott Lurndal)
wrote:

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave, SMM >AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM
ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1 (Kernel), EL0 (user)

Yes, but these are multiple levels _higher_ than User, and what I was
talking about were levels *lower* than User, so I fail to see how this indicates my idea isn't new.

Or perhaps your complaint is simply that we have too many levels
already. But that's somebody else's fault, and doesn't bear on whether
the feature I suggest might be useful.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Anton Ertl on Sun Jun 9 11:10:17 2024

On Sun, 09 Jun 2024 16:52:45 GMT, [email protected]
(Anton Ertl) wrote:

The proper answer to hardware bugs is not adding software limitations,
nor software mitigations (what the hardware makers suggest), but to
fix the hardware.

In the case of Spectre, fixing the hardware has a cost in performance.
So allowing the processor to run code with out-of-order execution
turned off for that code is a way to limit the performance loss to the untrusted code.

And this would work well on my Concertina II architecture, where VLIW
features, such as the break bit, and extended register banks of 128
registers each, are present. Code can be generated that avoids
register hazards when run in order.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Savard on Sun Jun 9 16:52:45 2024

John Savard <[email protected]d> writes:

If no branches... then no need for retpolines and stuff.

If no access to memory... no worries about rowhammer.

The proper answer to hardware bugs is not adding software limitations,
nor software mitigations (what the hardware makers suggest), but to
fix the hardware.

Given that, a third mode - not reduced-privilege so much as >reduced-efficiency - suggests itself.

That would be one fix, but fixes that cost less performance are
possible.

Cause some code to be executed... without any speculative execution;
allow branches, but don't execute anything until where the branch goes
is fully resolved.

This deals with Spectre and friends.

So the idea is to give an unprivileged user application, like a web
browser, a capability, without going through the operating system, to
run code that is sandboxed in appropriate ways to prevent it from
causing trouble although it is untrusted.

That browsers have to be able to run untrusted JavaScript

In general JavaScript cannot be executed without branches nor without
memory accesses. Therefore your modes will not be used for
JavaScript.

has been the
basic reason why computers today are insecure.

There is certainly something to that, even without hardware bugs.
JavaScript offers a huge attack surface, and lots of software-only vulnerabilities have been found in JavaScript engines over the
decades. One way to deal with that problem is to disable JavaScript.

But JavaScript and hardware bugs are not the only security problems on computers today.

If the only code that
ran on computers was trusted code, then the virus situation would be
like it was back in the days of 8-bit computers; except for
supply-chain attacks, just don't run pirated software, and you're
pretty much safe.

That's naive. All kinds of "trusted" software has vulnerabilities,
and hardware bugs make things worse.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Sun Jun 9 18:21:32 2024

John Savard wrote:

On Sun, 09 Jun 2024 16:52:45 GMT, [email protected]
(Anton Ertl) wrote:

The proper answer to hardware bugs is not adding software limitations,
nor software mitigations (what the hardware makers suggest), but to
fix the hardware.

In the case of Spectre, fixing the hardware has a cost in performance.

It does not have to have any performance cost.

So allowing the processor to run code with out-of-order execution
turned off for that code is a way to limit the performance loss to the untrusted code.

I, personally, do not trust any code; application of supervision.

And this would work well on my Concertina II architecture, where VLIW features, such as the break bit, and extended register banks of 128
registers each, are present. Code can be generated that avoids
register hazards when run in order.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Sun Jun 9 18:19:07 2024

John Savard wrote:

On Fri, 07 Jun 2024 12:03:03 -0600, John Savard <[email protected]d> wrote:

The first reduced-privilege state would not allow any branch
instructions, particularly conditional branches.

The second, in addition, would not allow any access to memory, only >>allowing access to registers.

Maybe I haven't made clear what this is _for_ as I thought it would be obvious.

If no branches... then no need for retpolines and stuff.

My 66000 needs no retpolnes for external calls/returns or for
SVCs and SVRs.

If no access to memory... no worries about rowhammer.

Rowhammer can be eliminated without restricting access to memory.

Given that, a third mode - not reduced-privilege so much as reduced-efficiency - suggests itself.

Cause some code to be executed... without any speculative execution;
allow branches, but don't execute anything until where the branch goes
is fully resolved.

This deals with Spectre and friends.

Spectré exploits the inability to keep microarchitectural state hidden
from architectural state. Design the pipeline correctly and you won't
Spectré, Meltdown, or friends...

So the idea is to give an unprivileged user application, like a web
browser, a capability, without going through the operating system, to
run code that is sandboxed in appropriate ways to prevent it from
causing trouble although it is untrusted.

That browsers have to be able to run untrusted JavaScript (and,
formerly, even Java and Flash, which have now been discarded) to
support the flexibility desired for modern web sites... has been the
basic reason why computers today are insecure. If the only code that
ran on computers was trusted code, then the virus situation would be
like it was back in the days of 8-bit computers; except for
supply-chain attacks, just don't run pirated software, and you're
pretty much safe.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Sun Jun 9 22:38:34 2024

On Sun, 09 Jun 2024 12:25:44 GMT, Anton Ertl wrote:

It also uses less specials than I expected; e.g., on the EV45 the IMB (instruction-memory barrier) PAL call is implemented by just executing
a big chunk of code such that the previous contents of the I-cache are evicted, while I expected that it would set a bit in a model-specific register.

I find that somehow amusing and horrifying at the same time ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Sun Jun 9 22:41:12 2024

On Sun, 09 Jun 2024 14:13:25 GMT, Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, >>theoretical concept rather than a practical one.

I'm quite sure that IBM would disagree with this statement.

I’m sure they would. But they invented virtualization in CP/CMS because
their attempt at an “interactive timesharing” system, CMS, was only single-user. Rather than make it multiuser, they simply invented CP as a
big hack to run multiple copies of CMS, so each user felt they had an
entire machine to themself.

There were some privilege holes in that, as well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Sun Jun 9 22:45:43 2024

On Sun, 9 Jun 2024 12:36:09 -0500, BGB wrote:

OTOH: A 32-bit value in seconds will overflow in 2038, so isn't really sufficient at this point.

A signed 32-bit value overflows in 2038, an unsigned value gives you a
little bit more breathing room.

32-bit builds of the Linux kernel already offer the option for a 64-bit
time_t. And Debian, for one, is currently in the middle of transitioning
its 32-bit builds to that.

How can I tell, on my 64-bit system? Because all the affected packages
have acquired “t64” suffixes on their names, and this applies to all architectures. I think this is temporary, though: once everything is fully compatible again, the “t64” suffixes will disappear. Presumably in time
for the next stable release.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Savard on Sun Jun 9 22:48:28 2024

On Sun, 09 Jun 2024 09:14:11 -0600, John Savard wrote:

A C compiler doesn't save data in memory that can then be executed. It
writes to a file.

That’s an implementation issue. Back in the day, there were such things as “load-and-go” compilers. E.g. the Waterloo Fortran that I used in some undergraduate courses. I’m sure people would have done ones for C.

These days, there are things called “JIT” (“just-in-time”) compilers.

And just as a further nitpick (if the above weren’t enough), what happens
if the “file” your C compiler is writing to is in a RAM disk?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to BGB on Mon Jun 10 07:53:08 2024

BGB wrote:

Though, there are some instructions which are currently allowed in user
mode but which it could make sense to trap in some contexts, such as
CPUID, or potentially just parts of CPUID, ...

Say, for example, CPUID has several pieces of information available:
CPU type and features;
Microsecond timer (local);
Clock cycle timer;
Hardware RNG;
...

In various contexts, it may be reasonable to want to trap and emulate
some of these while still allowing others to be unhindered.

Yeah.

Though, the time returned by the CPUID microsecond timer is not
currently the same as the one given by "TK_GetTimeUS()", where the
latter effectively gives a 64-bit value (conceptually) representing the number of microseconds since 1/1/1970; though with the kernel currently assuming that its build-time is the starting time for the clock (and
none of the FPGA boards support a hardware clock, and one would need internet access to use NTP, ...).

A 64-bit value in microseconds can express around +/- 300k years, which should be plenty.

Experience have shown that microsecond resolution is NOT good enough,
i.e. GPS timing receivers can typically give you ~25 ns RMS accuracy for
less than $100.

WinNT settled on 64-bit 100 ns ticks from 1600-01-01, that has turned
out to be pretty good, but (see above) not quite good enough for all uses.

Modern Unix typically provides 64-bit time_t seconds and a (effectively)
30-bit ns field, so you can store them in a 96-bit container but I don't
think anyone does that?

If you have a lot of such timestamps I would suggest you instead
truncate the time_t seconds field to just the classic 32 bits and use windowing around the current (full resolution) time.

A 64-bit value expressed in seconds could express values relative to the current age of the universe, but this is likely unnecessary for most purposes, and ability to express fractions of a second is likely more
useful than the ability to express the age of the universe.

NTP only needs relative timestamps, so Dr Mills settled on 32-bit
seconds (since 1900!) + 32-bit fractions, so NTP timestamps have roughly
0.25 ns resolution. The latter corresponds to 5 cm of fiber optic
transmission delay.

Granted, one could use a 128-bit value, and have both (and in
picoseconds if they wanted). But, this would be overkill.

Or, go extra overkill, and use 256 bits, to express the current age of
the universe in Planck units...

:-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Terje Mathisen on Mon Jun 10 07:04:10 2024

Terje Mathisen <[email protected]> writes:

Modern Unix typically provides 64-bit time_t seconds and a (effectively) =

30-bit ns field, so you can store them in a 96-bit container but I don't =

think anyone does that?

man time_t tells me:

|time_t
| ...
| Used for time in seconds. According to POSIX, it shall be an | integer type.

|timespec
| ...
| struct timespec {
| time_t tv_sec; /* Seconds */
| long tv_nsec; /* Nanoseconds */
| };
|
| Describes times in seconds and nanoseconds.
|
| Conforming to: C11 and later; POSIX.1-2001 and later.

So if you have a 64-bit time_t, the C standard does that, and POSIX
does it earlier. Typical ABIs pad struct timespec to 128 bits,
though.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Savard on Mon Jun 10 07:16:48 2024

John Savard <[email protected]d> writes:

On Sun, 09 Jun 2024 16:52:45 GMT, [email protected]
(Anton Ertl) wrote:

The proper answer to hardware bugs is not adding software limitations,
nor software mitigations (what the hardware makers suggest), but to
fix the hardware.

In the case of Spectre, fixing the hardware has a cost in performance.

How do you know?

Papers on so-called "invisible speculation" schemes have reported
slowdowns <10% for the more advanced schemes, with IIRC some even
reporting a speedup.

The main thing such solutions cost is area and design time. Ok, one
can argue that the area and design time could also be spent on making
faster vulnerable hardware, and then shift the responsibility for
dealing with the vulnerabilities to software, where those mitigations
that can be generally applied cost more than a factor of 2.

So allowing the processor to run code with out-of-order execution
turned off for that code is a way to limit the performance loss to the >untrusted code.

Your trust in "trusted code" is unfounded.

And this would work well on my Concertina II architecture, where VLIW >features, such as the break bit, and extended register banks of 128
registers each, are present. Code can be generated that avoids
register hazards when run in order.

How do "register hazards" come into play?

But I have seen similar trains of thoughts several times from static
scheduling advocates. They see Spectre as the opportunity to tout
their uncompetetive solutions by advocating solutions (like disabling speculation) that maximize the performance loss.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to [email protected] on Mon Jun 10 01:26:23 2024

On Sun, 9 Jun 2024 22:48:28 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

And just as a further nitpick (if the above weren�t enough), what happens
if the �file� your C compiler is writing to is in a RAM disk?

Well, the output could be stored with no problem, because while it's
on the RAM disk, it can't be executed. It has to be copied from the
RAM disk, into memory that's not pretending to be a disk, by the
loader. So this case doesn't change anything from the case of a real
disk.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to [email protected] on Mon Jun 10 07:15:05 2024

[email protected] (MitchAlsup1) writes:

My 66000 needs no retpolnes for external calls/returns or for
SVCs and SVRs.

Retpolines are used for performing indirect branches without
activating the indirect-branch predictor.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to All on Mon Jun 10 10:57:20 2024

MitchAlsup1 wrote:

EricP wrote:

Scott Lurndal wrote:

John Savard <[email protected]d> writes:

This may be a silly idea... but it seems to be the sort of thing that
current concerns about computer security may be calling for.

It is typical for computers to have a privileged mode of operation,
wherein I/O operations and certain special changes to the state of the >>>> computer are allowed that are barred to normal computational tasks.

For various reasons, miscreants have not been completely foiled by the >>>> existence of this feature.

Some types of instruction that are required for normal computation are >>>> still, to a certain extent, potentially harmful.

So I am thinking it might be useful to have, for example, two states
less privileged than the user state, and some mechanism for user
programs to call subroutines which are in that state until they return >>>> - the return instruction being limited, sort of like a supervisor
call, so it can only return in a proper manner.

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave,

SMM
AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM
ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1
(Kernel), EL0 (user)

VAX had 4 modes, User, Supervisor, Executive, Kernel.
VMS used Super for debugger and the command language DCL,
Exec was mostly for the file system.
Kernel was for the core of the OS.

What they found that not only do they not need 4 levels,
it was a pointless overhead to have to constantly switch between them.
(There is a pretty high penalty to switching modes, copying in args,
validating args, doing something usually simple, then switching back,
when it is all the OS's code anyway.)

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

According to these DEC'ers, work on a Virtual Machine Monitor (VMM)
for VAX began in 1981 for a high A1-level secure system.
It required rewriting some microcode.

Virtualizing the VAX Architecture, 1991 https://homes.cs.aau.dk/~kleist/Courses/nds-e05/papers/virtual-vax.pdf

But for similar reasons ring 1 and 2 are not used in x86 machines,
either. {{NOw, if we could just go back to 1982 and not invent IDTs, and
call gates, .....}}

I don't know what privileges Unix on VAX used but it was
probably 2 levels because PDP-11 had only 2 levels.

Alpha had 3 levels, User, Supervisor, and a higher third mode called
PAL for Privileged Architecture Library. It was supposed to be thought
of like microcode, privileged subroutines. Then PAL mode was used to
emulate the 4 levels that VMS expected when they ported it.

PAL was microcode in <fast> ROM in the native ISA.

As Anton also points out elsewhere, it was normal macro instructions.

However it does have aspects which are similar to microcode,
which are that PAL code is stored an a writable control store that
is a separate address space from main memory, that it has elevated
privilege while executing allowing access to HW not otherwise allowed,
and that interrupts are disabled while it executes.

But I came to realize that none of that is actually *required*.
It doesn't *need* a third privilege mode, and actually it looks
more expensive performance wise to have one than not.
It would be simpler and cheaper to just transition directly
to and from Super mode without also going through PAL mode.
And there is NO technical reason to restrict access to HW control
register from Super mode.

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and epilogue.
x86 did not disable interrupts on exceptions but x64 allows it as an option.

PAL mode does not require its own on-chip SRAM - it could exist in main
memory addressed through a base physical register or an MMU hack.
And having a dedicated private on-chip SRAM to hold critical OS code
does not mean that it is microcode. I would have this for my design
with an MMU fiddle to hard-wire a VA->PA mapping for some OS code.

After realizing it didn't need to exist, and that PAL mode looks more
expensive than just User/Super modes, I began to wonder why it was there.
Which leads me to here:

(I think PAL mode was a way to patent a feature that made the
ISA impossible to copy without their permission,
and therefore someone can't take DEC's executables and run them
on a clone processor, like what happened to IBM with Amdahl.)

Worked real well for them !!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Mon Jun 10 15:28:55 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Sun, 09 Jun 2024 14:13:25 GMT, Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, >>>theoretical concept rather than a practical one.

I'm quite sure that IBM would disagree with this statement.

I’m sure they would.

You're attempt to scramble to avoid being wrong was unsucessful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to John Savard on Mon Jun 10 15:30:01 2024

John Savard <[email protected]d> writes:

On Fri, 07 Jun 2024 18:18:33 GMT, [email protected] (Scott Lurndal)
wrote:

There are already more than five security rings in most
processors.

Intel: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, VMX, Enclave, SMM >>AMD: Ring 3, Ring 2 (unused), Ring 1(unused), Ring 0, SVM, SMM
ARM64: Realm Monitor, EL3 (Secure monitor), EL2(Hypervisor), EL1 (Kernel), EL0 (user)

Yes, but these are multiple levels _higher_ than User, and what I was
talking about were levels *lower* than User, so I fail to see how this >indicates my idea isn't new.

Or perhaps your complaint is simply that we have too many levels
already.

That's part of it.

But that's somebody else's fault, and doesn't bear on whether
the feature I suggest might be useful.

On the face of it, your feature is not useful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to EricP on Mon Jun 10 15:23:51 2024

EricP <[email protected]> writes:

PAL code is stored an a writable control store that
is a separate address space from main memory

Given the way that it (the EV45 PAL code) implements the PAL-call IMB,
i.e., by executing enough code to flush the I-cache, means that the
PAL-code is loaded into the I-cache, so I expect that it resides in
normal RAM. If that was in a separate memory space, there would need
to be an additional bit in each I-cache tag that records this fact.

But I came to realize that none of that is actually *required*.
It doesn't *need* a third privilege mode, and actually it looks
more expensive performance wise to have one than not.
It would be simpler and cheaper to just transition directly
to and from Super mode without also going through PAL mode.
And there is NO technical reason to restrict access to HW control
register from Super mode.

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and epilogue.
x86 did not disable interrupts on exceptions but x64 allows it as an option.

PAL mode does not require its own on-chip SRAM - it could exist in main >memory addressed through a base physical register or an MMU hack.
And having a dedicated private on-chip SRAM to hold critical OS code
does not mean that it is microcode. I would have this for my design
with an MMU fiddle to hard-wire a VA->PA mapping for some OS code.

After realizing it didn't need to exist, and that PAL mode looks more >expensive than just User/Super modes, I began to wonder why it was there. >Which leads me to here:

(I think PAL mode was a way to patent a feature that made the
ISA impossible to copy without their permission,

Not really. If there was a patent that is specific to it being a
different address space or a dedicated private on-chip SRAM, that
patent could be easily circumvented by the Amdahl-alike by putting the
PAL-code in RAM and using a base register or MMU hack, as you
describe.

Also if there was enough room for more on-chip SRAM on any of the
Alpha chips, the designers would have used that room to put in
features that make the chip faster.

Given that ARM is able to charge an architecture licensing fee for the instruction set alone, I am sure that DEC had enough patents on its
instruction set, no need for unnecessary and circumventable
implementation ideas.

One other thing they did: they had one PAL code coming with the SRM
console for VMS and Digital OSF/1, and another PAL code with the
ARC/AlphaBIOS console for Windows NT and Linux. This allowed them to
charge extra (quite a lot) for hardware capable of running their
premium OSs, while providing almost competetive prices for hardware
running PC OSs. Unfortunately, the PC-like package was still not price/performance competetive, and AlphaBIOS (which we had on our EV56
boxes) was a horror to work with.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to EricP on Mon Jun 10 18:52:06 2024

EricP wrote:
[snip]

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and epilogue.
x86 did not disable interrupts on exceptions but x64 allows it as an
option.

I have written a lot of x86 interrupt handlers, these chips did very
much disable all interrupts when transferring control to my handler.

The typical approach was to do the minimum work possible to save
whatever HW buffer/data needed saving, before executing a STI (SeT
Interrupt enable bit?) and then do anything else that had to be done
while still in the primary handler.

IRET restored flags, IP and CS, transferring control back to whatever
was running when the hw interrupt happened.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Terje Mathisen on Mon Jun 10 20:41:31 2024

On Mon, 10 Jun 2024 18:52:06 +0200
Terje Mathisen <[email protected]> wrote:

EricP wrote:
[snip]

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and
epilogue. x86 did not disable interrupts on exceptions but x64
allows it as an option.

I have written a lot of x86 interrupt handlers, these chips did very
much disable all interrupts when transferring control to my handler.

Intel's official terminology makes distinction between interrupts and exceptions. The former are external/asynchronous, the later are internal/synchronous. Exceptions are further sub-divided into faults,
traps and aborts.
Manual says that IF flag is cleared when interrupt is handled through
an interrupt gate. It is not cleared when interrupt is handled through
a trap gate. At that point manual does not say that exception handled
through an interrupt gate also clear IF flag, but later on (in p.
6.12.1.2 in my copy of the manual) it says that they do.

The typical approach was to do the minimum work possible to save
whatever HW buffer/data needed saving, before executing a STI (SeT
Interrupt enable bit?) and then do anything else that had to be done
while still in the primary handler.

IRET restored flags, IP and CS, transferring control back to whatever
was running when the hw interrupt happened.

Terje

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Mon Jun 10 18:43:09 2024

John Savard wrote:

On Sun, 9 Jun 2024 22:48:28 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

And just as a further nitpick (if the above werent enough), what
happens

if the file your C compiler is writing to is in a RAM disk?

Well, the output could be stored with no problem, because while it's
on the RAM disk, it can't be executed. It has to be copied from the
RAM disk, into memory that's not pretending to be a disk, by the
loader. So this case doesn't change anything from the case of a real
disk.

One can create a PTE pointing at that RAM disk page and then allow
someone to execute it directly.
OR
One can copy it somewhere that has execute permission in a single
instruction (MM = memory to memory move)

Neither is any real burden to enabling execute.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Anton Ertl on Mon Jun 10 19:02:58 2024

Anton Ertl wrote:

John Savard <[email protected]d> writes:

So allowing the processor to run code with out-of-order execution
turned off for that code is a way to limit the performance loss to the >>untrusted code.

Your trust in "trusted code" is unfounded.

Indeed.

And this would work well on my Concertina II architecture, where VLIW >>features, such as the break bit, and extended register banks of 128 >>registers each, are present. Code can be generated that avoids
register hazards when run in order.

How do "register hazards" come into play?

Registers values must appear to have been read and written as if the instruction stream was processed sequentially. This is the vonNeumann
paradigm.

But I have seen similar trains of thoughts several times from static scheduling advocates. They see Spectre as the opportunity to tout
their uncompetetive solutions by advocating solutions (like disabling speculation) that maximize the performance loss.

My 66000 has made no such claim..........on static scheduling.
My 66000 intends to have both In Order implementations and
Great Big Out of Order implementation.

But a funny thing happens when the ISA is sufficiently expressive
such as my universal constants implementation::

You lose a lot of instructions that are easily scheduled, sometimes
to the point all you have left is the instructions at the core of the algorithm. I have several subroutines with 30-40 FMAC FU instructions
in a row without anything else to do. No amount of code scheduling or
OoOness helps these cases.

- anton

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Terje Mathisen on Mon Jun 10 14:32:33 2024

Terje Mathisen wrote:

EricP wrote:
[snip]

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and epilogue.
x86 did not disable interrupts on exceptions but x64 allows it as an
option.

I have written a lot of x86 interrupt handlers, these chips did very
much disable all interrupts when transferring control to my handler.

The typical approach was to do the minimum work possible to save
whatever HW buffer/data needed saving, before executing a STI (SeT
Interrupt enable bit?) and then do anything else that had to be done
while still in the primary handler.

IRET restored flags, IP and CS, transferring control back to whatever
was running when the hw interrupt happened.

Terje

Yes, for x86/x64 external interrupts it raises the IRQ priority to that of
the requesting device, masking further interrupts of the same or lower IRQ priority. Or you can explicitly disable all maskable interrupts.

However for exceptions and NMI x86 does not mask interrupts so it is
possible for, say, a page fault or INT instruction to trap to the OS,
saving a frame on the stack, and just then an external interrupt to
arrive, saving another frame.

On the return from the interrupt or exception (we want a common return
code path) we need to know if this is a First Level Exception/Interrupt.
If not, we take the simple path and just REI Return Exception or Interrupt.
If it is a FLEI then we need to check for deferred work and jump into
the OS. Also it we are returning to User mode we may need to check
for things like thread APCs/signals that arrived while we were away.

On x86 there is also the difference between stack frame shape
depending on whether the prior mode was User or Super.
On x64 they fixed this so they are the same shape.

Then there is the difference between SYSCALL/SYSRET vs SYSENTER/SYSEXIT,
and that one did not set the system stack pointer on entry,
which leaves a security hole if an interrupt arrives just before
you can patch it.

And there was the NMI race condition bug, details of which I have
forgotten but was again something to do with the system stack not
being set correctly after switching to Super and then an NMI arrives
which does not set the stack because the prior mode was already Super.

Its not that these are not handleable, its that it takes literally
hundreds of instructions in the x86/x64 prologues and epilogues closing
each of these holes and idiosyncrasies. And that's on top of the already
large clocks cost for the IDT and call gates, and REI instructions.

*None* of this should be necessary.
Even the pipeline drain on mode switch should often be avoidable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to All on Mon Jun 10 17:11:47 2024

On Mon, 10 Jun 2024 18:43:09 +0000, [email protected] (MitchAlsup1)
wrote:

John Savard wrote:

On Sun, 9 Jun 2024 22:48:28 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

And just as a further nitpick (if the above weren?t enough), what
happens

if the ?file? your C compiler is writing to is in a RAM disk?

Well, the output could be stored with no problem, because while it's
on the RAM disk, it can't be executed. It has to be copied from the
RAM disk, into memory that's not pretending to be a disk, by the
loader. So this case doesn't change anything from the case of a real
disk.

One can create a PTE pointing at that RAM disk page and then allow
someone to execute it directly.
OR
One can copy it somewhere that has execute permission in a single
instruction (MM = memory to memory move)

Neither is any real burden to enabling execute.

I'm not claiming that locating code in a RAM disk would _prevent_ a
program from enabling its execution. Normally, though, that wouldn't
be done just because it would mess things up for the software that is
supposed to be in charge of reading and writing from the RAM disk if
anything else accessies it.

My point was entirely different. Just as a JIT compiler doesn't run
into issues because it writes code to memory, but because it writes
code to memory with the intent of executing it later - and enabling
both write and execute is restricted in the case of the sort of security-focused system we're discussing - an ordinary compiler
writing to a RAM disk instead of a physical disk runs into no issues.

Writing code in memory is not an issue. Write can be enabled to
memory. Only enabling write and execute together is potentially
subject to restricions.

So the idea of a RAM disk doesn't change anything.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Anton Ertl on Mon Jun 10 17:22:05 2024

On Mon, 10 Jun 2024 07:16:48 GMT, [email protected]
(Anton Ertl) wrote:

John Savard <[email protected]d> writes:

In the case of Spectre, fixing the hardware has a cost in performance.

How do you know?

Papers on so-called "invisible speculation" schemes have reported
slowdowns <10% for the more advanced schemes, with IIRC some even
reporting a speedup.

I've heard claims - especially from Mitch Alsup - that, indeed, all
one has to do is avoid certain _mistakes_ when designing a pipeline,
and there's no room for Spectre any more.

I'm no expert on these things at all, so I don't know that this can't
be true. But I also don't know that it _is_ true.

What does Spectre exploit? it exploits the fact that speculative
execution keeps around data that was fetched into cache by the
speculative execution of some code that was never supposed to be
executed. Just in case it might be useful later.

Obviously, keeping around any data that just happens to be
accidentally in cache, just in case it might be useful later, does
have a positive (but likely very slight) effect on performance. Being
strict about what speculative execution can do, on the other hand, so
nothing is allowed to leak information, will reduce performance... at
least a little bit.

It could well be that the losses aren't enough to be concerned about,
if this is done carefully. That is, not even the 3% quoted as the cost
of one of the earliest fixes. But since I've heard higher figures for
the fixes for later variants, without positive knowledge, I have to be skeptical about claims that all possible variants of this kind of
attack can be prevented at little cost.

And Rowhammer is even worse. It's not at all clear to me what can be
done without adding an expensive layer of monitoring to memory
accesses. However, only DRAM is vulnerable to Rowhammer, and so it may
be possible to turn cache into a bulwark against it somehow.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to All on Tue Jun 11 00:45:28 2024

I forgot to add that Mc 88120 had these features in 1992.

There was a staging buffer between AGEN and LDalign where up to 48
memory reference instructions could wait for data to become available,
for modified results to wait to be written back to DCache, for inst-
ructions to wait for memory order to resolve, and it could even
retarget when modified data could go. we called this thing the
Conditional Cache.

Stores waited for retirement.
Mises waited for retirement to modify DCache
Memory references could access data in CC so it added no0 cycle latency
and acted like memory forwarding (==register forwarding)
but the DCache was not modified until the causing instruction
to retire. The CC was in effect the memory pipeline !!
..

For example one could have modified data waiting for a DCache
write to become available when a subsequent memory reference
displaces the first line out towards memory, so there is no
line in the cache to write to !! so the CC wrote the data out
towards memory.

1992 !! And it could run MATRIX 300 at 5.9 IPC.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Tue Jun 11 00:27:02 2024

John Savard wrote:

On Mon, 10 Jun 2024 07:16:48 GMT, [email protected]
(Anton Ertl) wrote:

John Savard <[email protected]d> writes:

In the case of Spectre, fixing the hardware has a cost in performance.

How do you know?

Papers on so-called "invisible speculation" schemes have reported
slowdowns <10% for the more advanced schemes, with IIRC some even
reporting a speedup.

I've heard claims - especially from Mitch Alsup - that, indeed, all
one has to do is avoid certain _mistakes_ when designing a pipeline,
and there's no room for Spectre any more.

I'm no expert on these things at all, so I don't know that this can't
be true. But I also don't know that it _is_ true.

Timeline:: the microarchitecture of Intel's latest chips are derived
all the way back to Pentium Pro. Sure they have tweaked lots of
things and created an explosion of new instructions, but deep inside
it is still PP.

What does Spectre exploit? it exploits the fact that speculative
execution keeps around data that was fetched into cache by the
speculative execution of some code that was never supposed to be
executed. Just in case it might be useful later.

Yes

Obviously, keeping around any data that just happens to be
accidentally in cache, just in case it might be useful later, does
have a positive (but likely very slight) effect on performance. Being
strict about what speculative execution can do, on the other hand, so
nothing is allowed to leak information, will reduce performance... at
least a little bit.

In the course of accessing data from the cache, one also has to check
if
there is an outstanding request to memory for this same cache line. So,
when multiple requests all target the same cache line, one only fetches
it once. This check is performed fully associatively in the miss
buffer.
Since one is already checking the miss buffer, and the miss buffer has
to have any cache line pass through it during instruction execution::

ALL I have DONE is to not have the MB write into the cache until the
causing instruction retires !! Should the instruction NOT retire, the
data in the miss buffer can be delivered back to memory/whence it came (depending on coherence protocol) and we remain coherent without::
a) delaying the core
b) modifying the cache
c) exposing microarchitectural details

The only piece of logic that needs to change is the miss buffer in that
they currently only deliver the "critical word" of the miss and then
dump the buffer into the cache. All I ask is for the miss buffer to
deliver data to all outstanding requests while initiator is waiting
to retire. {This may need an extra entry (or 2) in MB to avoid losing performance.

Intel and AMD (and everyone else it appears) have not done a major
new microarchitecture since Spectré was announced {they may NOT even
CARE !} Instead of new microarchitecture, they prefer to add o the
width and depth of the execution window {not that anyone would
disagree).

Noting in My 66000 requires and serious modification to the GBOoO
general architecture of the execution window--just modifications to
some sequences to prevent microarchitectural leakage.

It could well be that the losses aren't enough to be concerned about,
if this is done carefully. That is, not even the 3% quoted as the cost
of one of the earliest fixes. But since I've heard higher figures for
the fixes for later variants, without positive knowledge, I have to be skeptical about claims that all possible variants of this kind of
attack can be prevented at little cost.

And Rowhammer is even worse. It's not at all clear to me what can be
done without adding an expensive layer of monitoring to memory
accesses. However, only DRAM is vulnerable to Rowhammer, and so it may
be possible to turn cache into a bulwark against it somehow.

My 66000 is also insensitive to RowHammer and derivatives.....

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Tue Jun 11 04:00:57 2024

On Mon, 10 Jun 2024 15:28:55 GMT, Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Sun, 09 Jun 2024 14:13:25 GMT, Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, >>>>theoretical concept rather than a practical one.

I'm quite sure that IBM would disagree with this statement.

I’m sure they would.

[Your] attempt to scramble to avoid being wrong was unsucessful.

Conway’s Law applies: any piece of software reflects the organizational structure that produced it. IBM had all these different, incompatible and operating systems that didn’t communicate with each other because they
were produced by different divisions of the company that didn’t
communicate with each other. So how to tie them together? Create yet
another system to act as the glue; not to provide any actual inter- communication capability, but just so users could at least run more than
one of them at a time, without needing a lot of extra hardware.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Tue Jun 11 04:07:17 2024

On Mon, 10 Jun 2024 15:23:51 GMT, Anton Ertl wrote:

One other thing they did: they had one PAL code coming with the SRM
console for VMS and Digital OSF/1, and another PAL code with the ARC/AlphaBIOS console for Windows NT and Linux. This allowed them to
charge extra (quite a lot) for hardware capable of running their premium
OSs, while providing almost competetive prices for hardware running PC
OSs.

Let me offer an anecdote related (or not?) to this. I had a client with
several DEC Alphas, all running DEC Unix (variously branded “OSF/1” and “Tru64”). The battery died on one of them, and it came up with a prompt asking for a Windows NT boot disk.

Both the user and I felt disappointed that a piece of such premium DEC hardware, when it lost its mind, would default to asking for Windows NT,
of all OSes.

Unfortunately, the PC-like package was still not price/performance competetive, and AlphaBIOS (which we had on our EV56 boxes) was a horror
to work with.

Windows NT was a disaster to the entire Unix workstation market. The irony
was, NT “Workstation” wasn’t really feature-equivalent to the OSes the Unix workstations were running. But it was enough for the customers, it
seems ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 11 04:10:00 2024

On Mon, 10 Jun 2024 20:41:31 +0300, Michael S wrote:

Intel's official terminology makes distinction between interrupts and exceptions. The former are external/asynchronous, the later are internal/synchronous. Exceptions are further sub-divided into faults,
traps and aborts.

That all sounds very DEC-like.

In particular, the DEC definition of a “fault” is that the saved PC on the stack still points at the instruction that caused the exception, so a return-from-exception will attempt to re-execute the same instruction.
This is exactly what you want for page faults, for example, but also for long-running interruptible instructions that haven’t finished yet.

Whereas a “trap” left the PC pointing at the following instruction. So a return from the exception handler will simply resume execution there.

Over the evolution of the VAX architecture, some exceptions which
initially were “traps” became “faults” instead.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Tue Jun 11 04:13:49 2024

On Mon, 10 Jun 2024 15:30:01 GMT, Scott Lurndal wrote:

On the face of it, your feature is not useful.

It allows for more fine-grained privilege separation. That is very likely
to be useful to certain, um, TLA markets, shall we say. Even ordinary
users now have a need to run potentially hostile code in a sandbox, just
as part of normal web-browsing. That need might have to develop in more
complex ways in future.

But on the other hand, those increased levels of separation are probably
needed in a less hierarchical, more matrix-connected way. I.e.
capabilities might be more relevant, rather than privilege rings.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to EricP on Tue Jun 11 04:11:23 2024

On Mon, 10 Jun 2024 14:32:33 -0400, EricP wrote:

And there was the NMI race condition bug ...

Not surprised there was trouble with the concept of a “non-maskable interrupt”. When I first heard of such a thing, I threw up my hands in horror.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Tue Jun 11 04:02:33 2024

On Mon, 10 Jun 2024 15:23:51 GMT, Anton Ertl wrote:

Given that ARM is able to charge an architecture licensing fee for the instruction set alone ...

I think that applies to newer versions, not the older ones. Given that ARM
goes back to the 1980s, any patents from the earliest years would have
expired by now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Savard on Tue Jun 11 04:14:48 2024

On Mon, 10 Jun 2024 01:26:23 -0600, John Savard wrote:

.. while it's on the RAM disk, it can't be executed.

Why not? A filesystem can still have executable mode bits on its files, regardless of whether the underlying medium is in persistent storage or
not.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Savard on Tue Jun 11 04:18:26 2024

On Mon, 10 Jun 2024 17:11:47 -0600, John Savard wrote:

Write can be enabled to memory. Only enabling write and execute together
is potentially subject to restricions.

I was going to say, it might be acceptable in current programming
environments to keep the two states (writable versus executable) carefully separated, with an explicit transition from one to the other.

But it turns out this isn’t always enough. I wrote some C code taking advantage of the GCC extension that lets you define nested routines, and
in that situation it creates “thunks” to allow inner routines to access local variables in outer routines, and that requires an executable stack.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Tue Jun 11 04:15:35 2024

On Mon, 10 Jun 2024 18:43:09 +0000, MitchAlsup1 wrote:

One can create a PTE pointing at that RAM disk page and then allow
someone to execute it directly.

Funny, that’s how demand-paging works.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to John Savard on Tue Jun 11 08:54:16 2024

On 2024-06-11 2:22, John Savard wrote:

On Mon, 10 Jun 2024 07:16:48 GMT, [email protected]
(Anton Ertl) wrote:

John Savard <[email protected]d> writes:

In the case of Spectre, fixing the hardware has a cost in performance.

How do you know?

Papers on so-called "invisible speculation" schemes have reported
slowdowns <10% for the more advanced schemes, with IIRC some even
reporting a speedup.

I've heard claims - especially from Mitch Alsup - that, indeed, all
one has to do is avoid certain _mistakes_ when designing a pipeline,
and there's no room for Spectre any more.

I'm no expert on these things at all, so I don't know that this can't
be true. But I also don't know that it _is_ true.

What does Spectre exploit? it exploits the fact that speculative
execution keeps around data that was fetched into cache by the
speculative execution of some code that was never supposed to be
executed. Just in case it might be useful later.

Obviously, keeping around any data that just happens to be
accidentally in cache, just in case it might be useful later, does
have a positive (but likely very slight) effect on performance.

Not always. If the mistakenly speculated cache-fetch /evicted/ some
other data from the (finite-sized) cache, and the evicted data are
needed later on the /true/ execution path, the mistakenly speculated
fetch has a /negative/ effect on performance. (This kind of "timing
anomaly" is very bothersome in static WCET analysis.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to EricP on Tue Jun 11 10:03:36 2024

EricP wrote:

Terje Mathisen wrote:

EricP wrote:
[snip]

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and epilogue.
x86 did not disable interrupts on exceptions but x64 allows it as an
option.

I have written a lot of x86 interrupt handlers, these chips did very
much disable all interrupts when transferring control to my handler.

The typical approach was to do the minimum work possible to save
whatever HW buffer/data needed saving, before executing a STI (SeT
Interrupt enable bit?) and then do anything else that had to be done
while still in the primary handler.

IRET restored flags, IP and CS, transferring control back to whatever
was running when the hw interrupt happened.

Terje

Yes, for x86/x64 external interrupts it raises the IRQ priority to that of the requesting device, masking further interrupts of the same or lower IRQ priority. Or you can explicitly disable all maskable interrupts.

I guess my vintage is showing! When I wrote HW interrupt handlers, none
of this applied so it was a much simpler world.

Initially there was no real priority in use because my handler would
start with IRQ disabled, I would poll/read the single byte serial port
buffer, then clear a hardware interrupt flag and then simply IRET.

A little later (286?) it became possible to selectively re-enable only
those interrupts that had a higher priority, so I would do that when my
most critical work was done.

Even later the serial port chip was replaced with a far better one which
had 16-byte IO buffers and programmable interrupt levels. AFAIR I would typically set it to signal when the buffer was half full, but 14 of 16
was also possible?

However for exceptions and NMI x86 does not mask interrupts so it is
possible for, say, a page fault or INT instruction to trap to the OS,
saving a frame on the stack, and just then an external interrupt to
arrive, saving another frame.

On the return from the interrupt or exception (we want a common return
code path) we need to know if this is a First Level Exception/Interrupt.
If not, we take the simple path and just REI Return Exception or Interrupt. If it is a FLEI then we need to check for deferred work and jump into
the OS. Also it we are returning to User mode we may need to check
for things like thread APCs/signals that arrived while we were away.

On x86 there is also the difference between stack frame shape
depending on whether the prior mode was User or Super.
On x64 they fixed this so they are the same shape.

Then there is the difference between SYSCALL/SYSRET vs SYSENTER/SYSEXIT,
and that one did not set the system stack pointer on entry,
which leaves a security hole if an interrupt arrives just before
you can patch it.

And there was the NMI race condition bug, details of which I have
forgotten but was again something to do with the system stack not
being set correctly after switching to Super and then an NMI arrives
which does not set the stack because the prior mode was already Super.

Its not that these are not handleable, its that it takes literally
hundreds of instructions in the x86/x64 prologues and epilogues closing
each of these holes and idiosyncrasies. And that's on top of the already large clocks cost for the IDT and call gates, and REI instructions.

*None* of this should be necessary.
Even the pipeline drain on mode switch should often be avoidable.

Ouch! Glad I got out of the IRQ handler business before 1990.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Tue Jun 11 09:37:32 2024

According to Lawrence D'Oliveiro <[email protected]d>:

I'm quite sure that IBM would disagree with this statement.

I’m sure they would. But they invented virtualization in CP/CMS because >their attempt at an “interactive timesharing” system, CMS, was only >single-user.

There's no need to make up silly stories like this when the actual
history is so well documented. CP/CMS was the IBM Cambridge Scientific
Center's response to the end of CTSS and the loss of the bid to build
Multics. CP and CMS were developed in tandem and it was always a
time-sharing system, originally on a modified 360/40, later on a
360/67.

For a long time IBM insisted the real time-sharing system was TSS,
then later TSO on MVS, while CP was just an unsupported lab curiosity. Eventually they gave in to the obvious, renamed it to VM, ported it to
S/370, and made it a real product.

The Wikipedia article has lots od details https://en.wikipedia.org/wiki/History_of_CP/CMS

As does this IBM paper on the history of VM https://www.vm.ibm.com/history/50th/vm370ori.pdf
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Terje Mathisen on Tue Jun 11 13:12:23 2024

On Tue, 11 Jun 2024 10:03:36 +0200
Terje Mathisen <[email protected]> wrote:

I guess my vintage is showing! When I wrote HW interrupt handlers,
none of this applied so it was a much simpler world.

Initially there was no real priority in use because my handler would
start with IRQ disabled, I would poll/read the single byte serial
port buffer, then clear a hardware interrupt flag and then simply
IRET.

I think, even the very first IBM PC had one Intel 8259 PIC. PC/XT had it
for sure. So, priorities were here. How useful, is another question.

One of the problems was that right from the beginning IBM engineers
ignored Intel's recommendations to wire external interrupts to IRQ
numbers 32 or higher. They thought that they know better. Of course,
they didn't.

A little later (286?) it became possible to selectively re-enable
only those interrupts that had a higher priority, so I would do that
when my most critical work was done.

PC/AT had two 8259 PICs connected as master and slave. So, more
priority levels at cost of less simple programming.
Now, 80286 CPU had ALOT of interrupt processing features non-heard of by earlier CPUs, but those were available only in protected mode, so
that's probably not what you had in mind above.

Even later the serial port chip was replaced with a far better one
which had 16-byte IO buffers and programmable interrupt levels. AFAIR
I would typically set it to signal when the buffer was half full, but
14 of 16 was also possible?

However for exceptions and NMI x86 does not mask interrupts so it is possible for, say, a page fault or INT instruction to trap to the
OS, saving a frame on the stack, and just then an external
interrupt to arrive, saving another frame.

On the return from the interrupt or exception (we want a common
return code path) we need to know if this is a First Level Exception/Interrupt. If not, we take the simple path and just REI
Return Exception or Interrupt. If it is a FLEI then we need to
check for deferred work and jump into the OS. Also it we are
returning to User mode we may need to check for things like thread APCs/signals that arrived while we were away.

On x86 there is also the difference between stack frame shape
depending on whether the prior mode was User or Super.
On x64 they fixed this so they are the same shape.

Then there is the difference between SYSCALL/SYSRET vs
SYSENTER/SYSEXIT, and that one did not set the system stack pointer
on entry, which leaves a security hole if an interrupt arrives just
before you can patch it.

And there was the NMI race condition bug, details of which I have
forgotten but was again something to do with the system stack not
being set correctly after switching to Super and then an NMI arrives
which does not set the stack because the prior mode was already
Super.

Its not that these are not handleable, its that it takes literally
hundreds of instructions in the x86/x64 prologues and epilogues
closing each of these holes and idiosyncrasies. And that's on top
of the already large clocks cost for the IDT and call gates, and
REI instructions.

*None* of this should be necessary.
Even the pipeline drain on mode switch should often be avoidable.

Ouch! Glad I got out of the IRQ handler business before 1990.

Terje

I think, Eric more than a little exaggerates about the level of
complexity of end-of-interrupt processing needed in common case.
May be, the code is long, but absolute majority of it is executed very
rarely, if at all.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue Jun 11 14:07:48 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 10 Jun 2024 20:41:31 +0300, Michael S wrote:

Intel's official terminology makes distinction between interrupts and
exceptions. The former are external/asynchronous, the later are
internal/synchronous. Exceptions are further sub-divided into faults,
traps and aborts.

That all sounds very DEC-like.

In particular, the DEC definition of a “fault” is that the saved PC on the >stack still points at the instruction that caused the exception, so a >return-from-exception will attempt to re-execute the same instruction.
This is exactly what you want for page faults, for example, but also for >long-running interruptible instructions that haven’t finished yet.

That distinction predated the VAX, of course. Pretty much every
hardware architecture at the time supported similar instruction restart semantics, particularly when it supported some form of memory management
trap behavior.

For example, the Burroughs mainframes distinguished between restartable faults/exceptions and interrupts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue Jun 11 14:11:55 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 10 Jun 2024 14:32:33 -0400, EricP wrote:

And there was the NMI race condition bug ...

Not surprised there was trouble with the concept of a “non-maskable >interrupt”. When I first heard of such a thing, I threw up my hands in >horror.

1) NMI are incredibly useful in certain cases, particularly for in-kernel debuggers.
2) NMI is actually maskable on Intel hardware (in the chipset, not the processor)
3) ARM refused to support NMI in Aarch64 (partially because they didn't
have a spare exception vector). They've backtracked and hacked in a
solution using the interrupt controller to create a pseudo-unmaskable
interrupt due to customer demand.

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/a-profile-non-maskable-interrupts

Back in the 90's, we had a custom PCI card with a single button that would trigger an NMI for debugging new hardware.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue Jun 11 14:04:59 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 10 Jun 2024 15:23:51 GMT, Anton Ertl wrote:

Given that ARM is able to charge an architecture licensing fee for the
instruction set alone ...

I think that applies to newer versions, not the older ones. Given that ARM >goes back to the 1980s, any patents from the earliest years would have >expired by now.

It has nothing to do with patents.

The architecture license provides far more than the ability
to implement the arm instruction set. BTDT.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Anton Ertl on Tue Jun 11 10:49:17 2024

Anton Ertl wrote:

EricP <[email protected]> writes:

PAL code is stored an a writable control store that
is a separate address space from main memory

Given the way that it (the EV45 PAL code) implements the PAL-call IMB,
i.e., by executing enough code to flush the I-cache, means that the
PAL-code is loaded into the I-cache, so I expect that it resides in
normal RAM. If that was in a separate memory space, there would need
to be an additional bit in each I-cache tag that records this fact.

It is normal SRAM but private to each core.
So there are no tags, no coherence.

Access to this SRAM is enabled by PAL mode and the program counter
register contains an address in that SRAM. I don't see an explicit
statement to say so but it looks to me that this is just a physical
address that the cpu decodes to this SRAM when in PAL mode.
I don't see a specific restriction that this memory be only used for
read-only code and that, with care, data cannot be stored in it.

The initial address for entry to PAL code comes from the 26-bit field
in the CALL_PAL instruction. That code number is validated, then shifted
left by some number of bits defined by a control register, say 6 bits,
and that is OR'd with base address register and stuffed into the PC.

But I came to realize that none of that is actually *required*.
It doesn't *need* a third privilege mode, and actually it looks
more expensive performance wise to have one than not.
It would be simpler and cheaper to just transition directly
to and from Super mode without also going through PAL mode.
And there is NO technical reason to restrict access to HW control
register from Super mode.

Many processors automatically disable interrupts on trap because it
greatly simplifies the race conditions in their prologue and epilogue.
x86 did not disable interrupts on exceptions but x64 allows it as an option. >>
PAL mode does not require its own on-chip SRAM - it could exist in main
memory addressed through a base physical register or an MMU hack.
And having a dedicated private on-chip SRAM to hold critical OS code
does not mean that it is microcode. I would have this for my design
with an MMU fiddle to hard-wire a VA->PA mapping for some OS code.

After realizing it didn't need to exist, and that PAL mode looks more
expensive than just User/Super modes, I began to wonder why it was there.
Which leads me to here:

(I think PAL mode was a way to patent a feature that made the
ISA impossible to copy without their permission,

Not really. If there was a patent that is specific to it being a
different address space or a dedicated private on-chip SRAM, that
patent could be easily circumvented by the Amdahl-alike by putting the PAL-code in RAM and using a base register or MMU hack, as you
describe.

If a clone used off chip memory then it will have much worse performance
and not be competitive. But what I believe PAL patented was the particular
set of functional behaviors: the third mode that enables this memory,
and disables interrupts, and enables HW register access, etc.

That would force cloners to rewrite those parts of an OS that depend
on this, which blocks running DEC's EXE's on your partial clone.

Also if there was enough room for more on-chip SRAM on any of the
Alpha chips, the designers would have used that room to put in
features that make the chip faster.

Given that ARM is able to charge an architecture licensing fee for the instruction set alone, I am sure that DEC had enough patents on its instruction set, no need for unnecessary and circumventable
implementation ideas.

As I understand it, you can't patent an ISA but you can patent a particular implementation of a function or feature. One of the patents that protects
the ARM32 ISA was on its interrupt mechanism. If you want to clone it then
you have to duplicate their interrupt mechanism. ARM has sued and won
on that basis which is why there are no ARM clones.

ARM vs picoTurbo patent lawsuit, 2001 https://www.electronicsweekly.com/news/archived/resources-archived/arm-and-picoturbo-settle-patent-lawsuit-2001-12/

One other thing they did: they had one PAL code coming with the SRM
console for VMS and Digital OSF/1, and another PAL code with the ARC/AlphaBIOS console for Windows NT and Linux. This allowed them to
charge extra (quite a lot) for hardware capable of running their
premium OSs, while providing almost competetive prices for hardware
running PC OSs. Unfortunately, the PC-like package was still not price/performance competetive, and AlphaBIOS (which we had on our EV56
boxes) was a horror to work with.

- anton

But this is exactly what I was thinking. They would not be able to
charge this way if there are exact clones because I could just run
the VMS PAL code on my cheap-o clone.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Tue Jun 11 16:55:45 2024

Lawrence D'Oliveiro wrote:

On Mon, 10 Jun 2024 20:41:31 +0300, Michael S wrote:

Intel's official terminology makes distinction between interrupts and
exceptions. The former are external/asynchronous, the later are
internal/synchronous. Exceptions are further sub-divided into faults,
traps and aborts.

That all sounds very DEC-like.

In particular, the DEC definition of a “fault” is that the saved PC on the
stack still points at the instruction that caused the exception, so a return-from-exception will attempt to re-execute the same instruction.
This is exactly what you want for page faults, for example, but also
for
long-running interruptible instructions that haven’t finished yet.

Whereas a “trap” left the PC pointing at the following instruction. So
a
return from the exception handler will simply resume execution there.

Both have the property where the PC is pointing at the first
instruction
not executed.

Over the evolution of the VAX architecture, some exceptions which
initially were “traps” became “faults” instead.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to All on Tue Jun 11 13:07:56 2024

On Tue, 11 Jun 2024 00:45:28 +0000, [email protected] (MitchAlsup1)
wrote:

I forgot to add that Mc 88120 had these features in 1992.

Stores waited for retirement.

Given that in the case of external RAM, as opposed to registers inside
the processor, there is only one possible value at any location...
memory doesn't have a pile of rename locations to play with... I am so unimaginative that I don't think I could design a CPU in which stores
to RAM didn't wait for the instruction that performed them to retire.

That, though, wouldn't save me from Spectre, since Spectre leaks
information by virtue of fetches of stuff _read_ in earlier speculated
code that didn't really happen being in cache.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to All on Tue Jun 11 13:27:35 2024

On Tue, 11 Jun 2024 00:27:02 +0000, [email protected] (MitchAlsup1)
wrote:

ALL I have DONE is to not have the MB write into the cache until the
causing instruction retires !!

I suppose that depends on how you define "write".

If by "write" you mean store data in the cache, for eventual writing
out into RAM, well, since RAM doesn't contain "rename locations" to
play with, it seems to me that any CPU designer had better do that.

At least, I'm not imaginative enough to think of doing it any other
way.

However, if by "write" you mean to change the state of the cache in
any way, such as by reading data from memory... now, _then_ you would
indeed have done what is necessary to combat Spectre.

Obviously, though, a "load" instruction will _never_ retire unless it
can read the data from memory it is trying to put in a register.

So apparently WHAT you have REALLY DONE is to modify how memory reads
work...

if the data a load instruction requires is not already in the cache,
then a direct read from memory is performed which *completely
bypasses* the cache; this data (and its associated address) are
retained by the CPU to be placed in the cache _if_ the instruction is
actually executed and when it retires.

And, in fact, the various cache levels have to work this way too. You
have an L1 cache miss, but an L2 cache hit? Fine, you take your data
directly from L2, and don't promote the data into L2 until instruction retirement.

So now the process of fetching data from memory is _not_ done by
fetching always from L1 and going _throughl_ L1 to access L2, and
going _through_ L2 to access RAM, which seems to be the usual way
these days.

That certainly can be done. But it isn't quite as simple and obvious
as you seem to claim.

My 66000 is also insensitive to RowHammer and derivatives.....

When I first read that sentence, I was completely incredulous. DRAM is sensitive to RowHammer because it's gone to feature sizes which are
beyond the state-of-the-art to do properly... so corners have been
cut.

How a CPU can be "insensitive" to it was mysterious.

After all, RowHammer is caused by multiple rapid-fire accesses to the
same address, or to related addresses, in memory.

But given that you are now explicitly passing accesses to DRAM around
the caches, instead of having the caches access DRAM as needed,
perhaps that also makes it possible for the CPU to detect suspicious
behavior more easily. (Since _relateld_ accesses may be used in a
RowHammer attack, simply pruning redundant memory accesses from the
operation stream won't be enough. I could see you doing _that_ as part
of "doing it right".)

If the "row" that was "hammered" just consisted of the 16 consecutive
locations that can be accessed speedily after the first one is ready,
then pruning reduntant accesses _would_ be enough, since to "hammer" a
row one has to access it hundreds of times, not at most 32 times; but
I'm afraid that isn't the case.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to [email protected] on Tue Jun 11 13:30:10 2024

On Tue, 11 Jun 2024 08:54:16 +0300, Niklas Holsti <[email protected]d> wrote:

Not always. If the mistakenly speculated cache-fetch /evicted/ some
other data from the (finite-sized) cache, and the evicted data are
needed later on the /true/ execution path, the mistakenly speculated
fetch has a /negative/ effect on performance. (This kind of "timing
anomaly" is very bothersome in static WCET analysis.)

Ouch. Another argument for having a victim cache. And a benefit of
doing it in what is apparently Mitch Alsup's way - holding off cache
updates until instruction retirement.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Tue Jun 11 20:50:14 2024

John Savard wrote:

On Tue, 11 Jun 2024 00:45:28 +0000, [email protected] (MitchAlsup1)
wrote:

I forgot to add that Mc 88120 had these features in 1992.

Stores waited for retirement.

Given that in the case of external RAM, as opposed to registers inside
the processor, there is only one possible value at any location...
memory doesn't have a pile of rename locations to play with... I am so unimaginative that I don't think I could design a CPU in which stores
to RAM didn't wait for the instruction that performed them to retire.

I never said it did. Effectively, there is a buffer that feeds the OoO
engine results as early as possible (acting like a cache, but a cache
withy the property that it can be discarded (in part or whole) just
like
instructions in the shadow of a branch can be discarded.) So, the
pipeline
gets fed by the buffer and the update of the actual cache is delayed
until
the instruction causing the event to retire.

So depending on where you look you can see the front of the pipeline or
the end of the pipeline--and it works the exact same way as branch
pred-
iction and with the same property that one can back up to some sane
point
based on external events (coherent messages,...) rather than branch
mis-
prediction.

That, though, wouldn't save me from Spectre, since Spectre leaks
information by virtue of fetches of stuff _read_ in earlier speculated
code that didn't really happen being in cache.

Here, Spectré performs back to back dependent LDs, by doing these many
times in a row, and prediction mechanism will lock in on these
instructions
being OK to execute.

Then, the first LD returns a pointer while the TLB returns "bad access"
But the loaded pointer goes through AGEN before permissions have been
fully checked. This second LD is not allowed to modify the data cache;
or Spectré can see this microarchitectural state change.

When the bad LD is discarded from execution so is the damage done by
the second LD--its data is discarded from the buffer and never makes
it to the cache. Instruction entering the pipeline after do not see
the data that should have never been there. Q.E.D. no Spectré sensi-
tivity.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Tue Jun 11 21:18:47 2024

John Savard wrote:

On Tue, 11 Jun 2024 00:27:02 +0000, [email protected] (MitchAlsup1)
wrote:

ALL I have DONE is to not have the MB write into the cache until the >>causing instruction retires !!

I suppose that depends on how you define "write".

I mean the memory cell does not get modified.

If by "write" you mean store data in the cache, for eventual writing
out into RAM, well, since RAM doesn't contain "rename locations" to
play with, it seems to me that any CPU designer had better do that.

The cache itself is not modified until the memory reference retires.
But there is a buffer holding the data which can be accessed as if
it were an L0 cache until the data migrates to the real cache at
retirement.

At least, I'm not imaginative enough to think of doing it any other
way.

However, if by "write" you mean to change the state of the cache in
any way, such as by reading data from memory... now, _then_ you would
indeed have done what is necessary to combat Spectre.

The cache is not modified, the data is available through another means.
a means that can be backed up like a mispredicted branch. The buffer
I am talking about is temporally organized not spatially organized.

Obviously, though, a "load" instruction will _never_ retire unless it
can read the data from memory it is trying to put in a register.

The LD instruction can obtain data from either the buffer or from
the data cache itself. The buffer covers the execution window,
allowing the LD to retire (assuming every older instruction also
retires).

So apparently WHAT you have REALLY DONE is to modify how memory reads
work...

I pipelined them through a temporally organized memory execution
window. This also provides for allowing the memory system to run
OoO wrt program order, and detect actual ordering violations, and
rerun the memory references in a proper memory order by rerunning
the references in order.

You get relaxed memory order performance and precise memory order simultaneously.

if the data a load instruction requires is not already in the cache,
then a direct read from memory

The request is forwards towards memory through the cache hierarchy
and data arrives back at requestor (sooner or later).

is performed which *completely
bypasses* the cache;

Yes, critical word first.

this data (and its associated address) are
retained by the CPU to be placed in the cache _if_ the instruction is actually executed and when it retires.

Yes !! While the data resides in the buffer, the whole line can be
accessed by a number of memory reference instructions.

And, in fact, the various cache levels have to work this way too. You
have an L1 cache miss, but an L2 cache hit? Fine, you take your data
directly from L2, and don't promote the data into L2 until instruction retirement.

I use an exclusive cache organization. so data arriving at the CPU
goes into buffer, which upon retirement goes into L1, which has the
potential to push a L1->L2 line, and so forth.

So now the process of fetching data from memory is _not_ done by
fetching always from L1 and going _throughl_ L1 to access L2, and
going _through_ L2 to access RAM, which seems to be the usual way
these days.

Its back to the Athlon/Operon organizations.

That certainly can be done. But it isn't quite as simple and obvious
as you seem to claim.

If you had worked on them you can recognize the advantages and dis-
advantages.

My 66000 is also insensitive to RowHammer and derivatives.....

When I first read that sentence, I was completely incredulous. DRAM is sensitive to RowHammer because it's gone to feature sizes which are
beyond the state-of-the-art to do properly... so corners have been
cut.

How a CPU can be "insensitive" to it was mysterious.

After all, RowHammer is caused by multiple rapid-fire accesses to the
same address, or to related addresses, in memory.

Yes, the write buffer in my DRAM controller is the L3 cache. Modified
data in the L3 migrates towards DRAM as DRAM cycles permit, but there
is no way to cause a line to be continuously be written into DRAM.
If a modified line has migrated to DRAM, and it gets modified again
in the L3, that 2nd write will not be performed until a refresh cycle
on that DRAM is performed.

Thus if one tries to RowHammer My 66000 DRAM, DRAM gets refresh cycle
between each write.

But given that you are now explicitly passing accesses to DRAM around
the caches, instead of having the caches access DRAM as needed,
perhaps that also makes it possible for the CPU to detect suspicious
behavior more easily. (Since _relateld_ accesses may be used in a
RowHammer attack, simply pruning redundant memory accesses from the
operation stream won't be enough. I could see you doing _that_ as part
of "doing it right".)

Banging on related cache lines also results in refresh cycles.

If the "row" that was "hammered" just consisted of the 16 consecutive locations that can be accessed speedily after the first one is ready,
then pruning reduntant accesses _would_ be enough, since to "hammer" a
row one has to access it hundreds of times, not at most 32 times; but
I'm afraid that isn't the case.

I doubt that RowHammer still works when refreshes are interspersed
between accesses--RowHammer generally works because the events are
not protected by refreshes--the DRC sees the right ROW open and
simple streams at the open bank.

Also note, there are no instructions in My 66000 that force a cache
to DRAM whereas there are instructions that can force a cache line
into L3. L3 is the buffer to DRAM. Nothing gets to DRAM without
going through L3 and nothing comes out of DRM that is not also
buffer by L3. So, if 96 cores simultaneously read a line residing in
DRAM, DRAM is read once and 95 cores are serviced through L3. So,
you can't RowHammer based on reading DRAM, either.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Savard on Tue Jun 11 21:20:45 2024

John Savard wrote:

On Tue, 11 Jun 2024 08:54:16 +0300, Niklas Holsti <[email protected]d> wrote:

Not always. If the mistakenly speculated cache-fetch /evicted/ some
other data from the (finite-sized) cache, and the evicted data are
needed later on the /true/ execution path, the mistakenly speculated
fetch has a /negative/ effect on performance. (This kind of "timing >>anomaly" is very bothersome in static WCET analysis.)

That is why you don't update the cache until the causing instruction
retires.

Ouch. Another argument for having a victim cache. And a benefit of
doing it in what is apparently Mitch Alsup's way - holding off cache
updates until instruction retirement.

Change your thought from victim cache to pipeline buffer pre cache.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Wed Jun 12 02:47:48 2024

On Tue, 11 Jun 2024 09:37:32 -0000 (UTC), John Levine wrote:

According to Lawrence D'Oliveiro <[email protected]d>:

I'm quite sure that IBM would disagree with this statement.

I’m sure they would. But they invented virtualization in CP/CMS because >>their attempt at an “interactive timesharing” system, CMS, was only >>single-user.

There's no need to make up silly stories like this ...

No need to take my word for it. Bitsavers added issues of a magazine
called “Mainframe” a few months back. I took the trouble to read the first one--it’s all about IBM, as though other “mainframe” machines didn’t exist. There’s a description of the background to CP/CMS (later VM/CMS) there.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to [email protected] on Tue Jun 11 22:18:44 2024

On Tue, 11 Jun 2024 04:07:17 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

Windows NT was a disaster to the entire Unix workstation market. The irony >was, NT “Workstation” wasn’t really feature-equivalent to the OSes the >Unix workstations were running. But it was enough for the customers, it
seems ...

The differences were almost all at user level: the most glaring
examples being NT's single user shell [even on server editions], lack
of admin tools in the workstation edition, and lack of development
tools in all editions.

Considered as an "operating system" - ie. what could be implemented on
the platform - NT certainly was (mostly) equivalent to Unix. Note:
equivalence is not "sameness" - NT was implemented differently, its
APIs were different, and code that was "equivalent" in function often
did not look the same (and was not transportable).

BTDTGTTS

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Wed Jun 12 02:51:51 2024

On Tue, 11 Jun 2024 16:55:45 +0000, MitchAlsup1 wrote:

Lawrence D'Oliveiro wrote:

In particular, the DEC definition of a “fault” is that the saved PC on >> the stack still points at the instruction that caused the exception, so
a return-from-exception will attempt to re-execute the same
instruction. This is exactly what you want for page faults, for
example, but also for long-running interruptible instructions that
haven’t finished yet.

Whereas a “trap” left the PC pointing at the following instruction. So >> a return from the exception handler will simply resume execution there.

Both have the property where the PC is pointing at the first instruction
not executed.

Perhaps you meant “completed” rather than “executed”.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jun 12 02:50:24 2024

On Tue, 11 Jun 2024 14:04:59 GMT, Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 10 Jun 2024 15:23:51 GMT, Anton Ertl wrote:

Given that ARM is able to charge an architecture licensing fee for the
instruction set alone ...

I think that applies to newer versions, not the older ones. Given that
ARM goes back to the 1980s, any patents from the earliest years would
have expired by now.

It has nothing to do with patents.

The architecture license provides far more than the ability to implement
the arm instruction set. BTDT.

IANAL, but there are four kinds of “intellectual property”: copyrights, patents, trademarks and trade secrets.

If you were incorporating logic components developed by ARM, then
licensing those might be covered by copyrights and trade secrets.

But if you’re a company like Apple, which designs and builds its own
chips, then they need neither of those things.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jun 12 02:53:41 2024

On Tue, 11 Jun 2024 14:11:55 GMT, Scott Lurndal wrote:

1) NMI are incredibly useful in certain cases, particularly for
in-kernel debuggers.
2) NMI is actually maskable on Intel hardware (in the chipset, not the processor)

Do you see a contradiction between the two? In that a “non-maskable” interrupt inevitably has to be “maskable” in certain situations. And how does that affect your in-kernel debugger?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Scott Lurndal on Wed Jun 12 05:38:01 2024

Scott Lurndal <[email protected]> schrieb:

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 10 Jun 2024 15:23:51 GMT, Anton Ertl wrote:

Given that ARM is able to charge an architecture licensing fee for the
instruction set alone ...

I think that applies to newer versions, not the older ones. Given that ARM >>goes back to the 1980s, any patents from the earliest years would have >>expired by now.

It has nothing to do with patents.

The architecture license provides far more than the ability
to implement the arm instruction set. BTDT.

The current spat between ARM and Qualcomm is quite interesting in
that respect. It seems that ARM now demands that all PCs using Snapdragon-X-CPUs be destroyed. In return, Qualcomm accuses ARM
of all sorts of bad things, including threatening to terminate
Qualcomm's licenses if they insisted on enforcing their contractual
rights.

The spat also appears to be about ARM wants a bigger slice of
the pie on smartphones, they demand a share of the sales price of
the final product instead of the CPU. That actually sounds like
something that the antitrust authorities might be interested in.

If the cases ever go to trial, at least one ARM license agreement
will be publically available.

And, finally, if people will excuse the pun: This looks like
StrongARM tactics.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Wed Jun 12 06:31:17 2024

On Wed, 12 Jun 2024 05:38:01 -0000 (UTC), Thomas Koenig wrote:

The spat also appears to be about ARM wants a bigger slice of the pie on smartphones, they demand a share of the sales price of the final product instead of the CPU. That actually sounds like something that the
antitrust authorities might be interested in.

This kind of greed can only boost the fortunes of alternatives like
RISC-V.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed Jun 12 07:47:18 2024

According to Lawrence D'Oliveiro <[email protected]d>:

There's no need to make up silly stories like this ...

No need to take my word for it. Bitsavers added issues of a magazine
called “Mainframe” a few months back. I took the trouble to read the first >one--it’s all about IBM, as though other “mainframe” machines didn’t >exist. There’s a description of the background to CP/CMS (later VM/CMS) >there.

I see Mainframe Journal, with the earliest issue being Jul/Aug 1988. Is
that it? I don't see anything in the ToC that looks like a VM overview.

In any event, I'd find the second article I linked to, the VM history
written by IBMers who were there, more credible than some random third
party magazine. CMS really was written at the same time as CP, and
they always intended them to work together as a time-sharing system.

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Wed Jun 12 07:56:50 2024

On Wed, 12 Jun 2024 07:47:18 -0000 (UTC), John Levine wrote:

In any event, I'd find the second article I linked to, the VM history
written by IBMers who were there, more credible than some random third
party magazine.

By all means, check the bios of the authors, included as with any
magazine. It was written by IBM pros, for IBM pros.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to D'Oliveiro on Wed Jun 12 09:46:00 2024

In article <v48ihl$sc37$[email protected]>, [email protected]d (Lawrence
D'Oliveiro) wrote:

Windows NT was a disaster to the entire Unix workstation market.
The irony was, NT _Workstation_ wasn_t really feature-equivalent to
the OSes the Unix workstations were running. But it was enough for
the customers, it seems ...

It had important advantages for many customers:

Lower costs at equivalent or better performance, once the Intel Pentium
Pro had appeared. The OS cost much less than a commercial Unix, and mass production meant the hardware was much cheaper.

It ran Microsoft Office. This was really important to corporate managers,
who wanted their engineers to be able to read and create Office documents,
and were frustrated by Unix workstations' inability to do so, and their engineers not being worried about the problem. Supplying extra PCs was expensive and took up office space. Software emulation was slow and
unreliable; add-in cards to provide a PC capability were rare, expensive
and unreliable.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Lawrence D'Oliveiro on Wed Jun 12 07:48:53 2024

Lawrence D'Oliveiro wrote:

On Mon, 10 Jun 2024 14:32:33 -0400, EricP wrote:

And there was the NMI race condition bug ...

Not surprised there was trouble with the concept of a “non-maskable interrupt”. When I first heard of such a thing, I threw up my hands in horror.

Yes, NMI has "reentrancy issues".

8086 has an NMI input pin and according to the manual has
higher priority than the maskable interrupt INTR input pin.
It will re-trigger on each rising edge so if you don't want it to
trigger multiple times you have to latch the input yourself.
Intel manual suggests it might be used for a power fail routine.

I seem to recall that on the original PC someone started selling
add-on FPU boards. Maybe that was Weitek or AMD? Anyway, they hijacked
the unused NMI input and used it to signal an FPU error.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Scott Lurndal on Wed Jun 12 07:57:43 2024

Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 10 Jun 2024 14:32:33 -0400, EricP wrote:

And there was the NMI race condition bug ...

Not surprised there was trouble with the concept of a “non-maskable
interrupt”. When I first heard of such a thing, I threw up my hands in
horror.

1) NMI are incredibly useful in certain cases, particularly for in-kernel debuggers.
2) NMI is actually maskable on Intel hardware (in the chipset, not the processor)
3) ARM refused to support NMI in Aarch64 (partially because they didn't
have a spare exception vector). They've backtracked and hacked in a
solution using the interrupt controller to create a pseudo-unmaskable
interrupt due to customer demand.

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/a-profile-non-maskable-interrupts

Back in the 90's, we had a custom PCI card with a single button that would trigger an NMI for debugging new hardware.

As you point out, this "NMI" is maskable, it's just masked in a different control register than usual interrupts.

The problem with a real NMI is controlling reentracy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to All on Wed Jun 12 08:22:03 2024

MitchAlsup1 wrote:

John Savard wrote:

After all, RowHammer is caused by multiple rapid-fire accesses to the
same address, or to related addresses, in memory.

Yes, the write buffer in my DRAM controller is the L3 cache. Modified
data in the L3 migrates towards DRAM as DRAM cycles permit, but there
is no way to cause a line to be continuously be written into DRAM.
If a modified line has migrated to DRAM, and it gets modified again
in the L3, that 2nd write will not be performed until a refresh cycle
on that DRAM is performed.

Thus if one tries to RowHammer My 66000 DRAM, DRAM gets refresh cycle
between each write.

What does it do if L3 receives more writes than it has ways in a row,
does it stall evicts from L2?

Lets say L3 is 4 way assoc and all four in a L3 row been updated,
then a 5th way in that same row is written from L2.
L3 has no place to hold that 5th way and it can't evict one
of the other 4 ways because that could cause rowhammer.

Seems to me that all it can do is stall the 5th write from L2 until
DRAM refresh rolls around and re-enables one of the pending L3 writes,
which would back up victim evicts from L2.
Or maybe L3 has a small fully assoc emergency overflow buffer,
but still that could fill up too.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Michael S on Wed Jun 12 09:38:17 2024

Michael S wrote:

On Tue, 11 Jun 2024 10:03:36 +0200
Terje Mathisen <[email protected]> wrote:

*None* of this should be necessary.
Even the pipeline drain on mode switch should often be avoidable.

Ouch! Glad I got out of the IRQ handler business before 1990.

Terje

I think, Eric more than a little exaggerates about the level of
complexity of end-of-interrupt processing needed in common case.
May be, the code is long, but absolute majority of it is executed very rarely, if at all.

Possibly, as I do have a tendency to get somewhat animated about this.
I can't find it just now but a while back I was looking at some
Linux source code for the x86 interrupt return path,
and it went on for page after page after page.

I did find this diagram which shows slightly convoluted but still understandable return path for Linux and some x86 assembler for it:

https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch04s08.html

https://coral.googlesource.com/linux-imx/+/refs/heads/release-chef/arch/x86/entry/entry_32.S

You can find down to label ret_from_intr: for example, which does a
conditional jb resume_kernel then falls through into resume_userspace,
which DISABLE_INTERRUPTS and calls prepare_exit_to_usermode,
then jumps to restore_all which eventually does INTERRUPT_RETURN.

prepare_exit_to_usermode is in common.c here and does quite
a lot of other checks:

https://coral.googlesource.com/linux-imx/+/refs/heads/release-chef/arch/x86/entry/common.c

The problem I have with this approach is that it deals with all the race conditions (eg a nested interrupt posts a new softirq between when you
checked for pending softirq's and the IRET) by running with interrupts
disabled for long instruction sequences. I consider that to be a poor way
to do this as that blocks processing all other interrupts.

Ideally the ISA and hardware should be designed so the interrupt return
path should not have to disable interrupts at all, or at worst then
just for a few instructions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to John Levine on Wed Jun 12 13:56:35 2024

John Levine <[email protected]> writes:

According to Lawrence D'Oliveiro <[email protected]d>:

There's no need to make up silly stories like this ...

No need to take my word for it. Bitsavers added issues of a magazine
called “Mainframe” a few months back. I took the trouble to read the first
one--it’s all about IBM, as though other “mainframe” machines didn’t >>exist. There’s a description of the background to CP/CMS (later VM/CMS) >>there.

I see Mainframe Journal, with the earliest issue being Jul/Aug 1988. Is
that it? I don't see anything in the ToC that looks like a VM overview.

In any event, I'd find the second article I linked to, the VM history
written by IBMers who were there, more credible than some random third
party magazine. CMS really was written at the same time as CP, and
they always intended them to work together as a time-sharing system.

Lynn's old posts make this pretty clear. And he was there.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jun 12 13:57:26 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Wed, 12 Jun 2024 07:47:18 -0000 (UTC), John Levine wrote:

In any event, I'd find the second article I linked to, the VM history
written by IBMers who were there, more credible than some random third
party magazine.

By all means, check the bios of the authors, included as with any
magazine. It was written by IBM pros, for IBM pros.

In other words, you have nothing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jun 12 13:53:21 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Tue, 11 Jun 2024 14:11:55 GMT, Scott Lurndal wrote:

1) NMI are incredibly useful in certain cases, particularly for
in-kernel debuggers.
2) NMI is actually maskable on Intel hardware (in the chipset, not the
processor)

Do you see a contradiction between the two?

No.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Wed Jun 12 10:52:20 2024

I think, Eric more than a little exaggerates about the level of
complexity of end-of-interrupt processing needed in common case.
May be, the code is long, but absolute majority of it is executed very
rarely, if at all.

Possibly, as I do have a tendency to get somewhat animated about this.
I can't find it just now but a while back I was looking at some
Linux source code for the x86 interrupt return path,
and it went on for page after page after page.

Beside the code size cost and associated runtime impact, there's also
the fact that this complexity inevitably comes with an increased risk
of bugs.

Nick McLaren could go on and on about this as an infinite source of bugs
that are so hard to track down that they're basically never even
diagnosed correctly (let alone fixed).

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to All on Wed Jun 12 20:34:13 2024

MitchAlsup1 wrote:

John Savard wrote:

On Tue, 11 Jun 2024 00:27:02 +0000, [email protected] (MitchAlsup1)
wrote:

ALL I have DONE is to not have the MB write into the cache until the
causing instruction retires !!

I suppose that depends on how you define "write".

I mean the memory cell does not get modified.

If by "write" you mean store data in the cache, for eventual writing
out into RAM, well, since RAM doesn't contain "rename locations" to
play with, it seems to me that any CPU designer had better do that.

The cache itself is not modified until the memory reference retires.
But there is a buffer holding the data which can be accessed as if
it were an L0 cache until the data migrates to the real cache at
retirement.

At least, I'm not imaginative enough to think of doing it any other
way.

However, if by "write" you mean to change the state of the cache in
any way, such as by reading data from memory... now, _then_ you would
indeed have done what is necessary to combat Spectre.

The cache is not modified, the data is available through another means.
a means that can be backed up like a mispredicted branch. The buffer
I am talking about is temporally organized not spatially organized.

Obviously, though, a "load" instruction will _never_ retire unless it
can read the data from memory it is trying to put in a register.

The LD instruction can obtain data from either the buffer or from
the data cache itself. The buffer covers the execution window,
allowing the LD to retire (assuming every older instruction also
retires).

So apparently WHAT you have REALLY DONE is to modify how memory reads
work...

I pipelined them through a temporally organized memory execution
window. This also provides for allowing the memory system to run
OoO wrt program order, and detect actual ordering violations, and
rerun the memory references in a proper memory order by rerunning
the references in order.

You get relaxed memory order performance and precise memory order simultaneously.

if the data a load instruction requires is not already in the cache,
then a direct read from memory

The request is forwards towards memory through the cache hierarchy
and data arrives back at requestor (sooner or later).

is performed which *completely
bypasses* the cache;

Yes, critical word first.

this data (and its associated address) are
retained by the CPU to be placed in the cache _if_ the instruction is
actually executed and when it retires.

Yes !! While the data resides in the buffer, the whole line can be
accessed by a number of memory reference instructions.

And, in fact, the various cache levels have to work this way too. You
have an L1 cache miss, but an L2 cache hit? Fine, you take your data
directly from L2, and don't promote the data into L2 until instruction
retirement.

I use an exclusive cache organization. so data arriving at the CPU
goes into buffer, which upon retirement goes into L1, which has the
potential to push a L1->L2 line, and so forth.

So now the process of fetching data from memory is _not_ done by
fetching always from L1 and going _throughl_ L1 to access L2, and
going _through_ L2 to access RAM, which seems to be the usual way
these days.

Its back to the Athlon/Operon organizations.

That certainly can be done. But it isn't quite as simple and obvious
as you seem to claim.

If you had worked on them you can recognize the advantages and dis- advantages.

My 66000 is also insensitive to RowHammer and derivatives.....

When I first read that sentence, I was completely incredulous. DRAM is
sensitive to RowHammer because it's gone to feature sizes which are
beyond the state-of-the-art to do properly... so corners have been
cut.

How a CPU can be "insensitive" to it was mysterious.

After all, RowHammer is caused by multiple rapid-fire accesses to the
same address, or to related addresses, in memory.

Yes, the write buffer in my DRAM controller is the L3 cache. Modified
data in the L3 migrates towards DRAM as DRAM cycles permit, but there
is no way to cause a line to be continuously be written into DRAM.
If a modified line has migrated to DRAM, and it gets modified again
in the L3, that 2nd write will not be performed until a refresh cycle
on that DRAM is performed.

Thus if one tries to RowHammer My 66000 DRAM, DRAM gets refresh cycle
between each write.

Rowhammer can modify nearby lines, not just the ones that are being
hammered, right? How do you guarantee that all neighbors will also be refreshed?

Similarly, if the accesses are LOCK XADD operations, and you have
multiple CPUs (or cores not sharing a common last level cache, then I
don't see any way to avoid those accesses from making it all the way to
the RAM chips?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to John Levine on Wed Jun 12 11:54:55 2024

John Levine <[email protected]> writes:

In any event, I'd find the second article I linked to, the VM history
written by IBMers who were there, more credible than some random third
party magazine. CMS really was written at the same time as CP, and
they always intended them to work together as a time-sharing system.

Some of the MIT CTSS/7094 people went to the 5th flr to do Multics;
others went to the science center on the 4th flr to do virtual machines, internal network, invent GML in 1969, other interactive applications.

cambridge science center wanted a 360/50 to add virtual memory to
... but all the spare 360/50s were going to FAA ATC project ... and they
had to settle for 360/40. (virtual machine) CP/40 (running on bare
hardware using hardware virtual memory mods _ was developed in parallel
with CMS (running on bare 360/40). When CP/40 virtual machines was
operational, they then could run CMS in CP/40 virtual machines.

Melinda history
http://www.leeandmelindavarian.com/Melinda#VMHist
and CP/40 http://www.leeandmelindavarian.com/Melinda/JimMarch/CP40_The_Origin_of_VM370.pdf
my OCR from Comeau's original paper https://www.garlic.com/~lynn/cp40seas1982.txt

CP/40 morphs into CP/67 when 360/67 standard with virtual memory becomes available. I was responsible for OS/360 running on 360/67 (as 360/65),
univ shutdown datacenter on weekends and I had datacenter dedicated for
48hrs straight). CSC came out Jan1968 to install CP/67 (3rd install
after CSC itself and MIT Lincoln Labs) ,,, and I mostly played with it
during my weekend dedicated time. First couple months was rewritting pathlengths for running OS/360 in virtual machine. Benchmark was OS/360 jobstream that ran 322secs on real machine. Started out 858secs in
virtual machine (CP67 CPU 534secs) .... after few months got CP67 CPU
down to 113secs. I then rewrite time-sharing system scheduling and
dispatching, page I/O and page replacement, I/O arm scheduling, etc.

I'v joked that original CP/67 scheduling delivered to univ (and I
completely replaced) ... looked a lot like Unix scheduling that I first
saw 15yrs later. Also 1st install at univ (jan1968) had CP67 source in
OS/360 datasets ... it wasn't until a few months later that they moved
source to CMS files. After I graduated and joined science center, one of
my hobbies was enhanced production operating systems for internal
datacenters.

CP-67
https://en.wikipedia.org/wiki/CP-67
CP/CMS
https://en.wikipedia.org/wiki/CP/CMS
History of CP/CMS
https://en.wikipedia.org/wiki/History_of_CP/CMS
Cambridge Scientific Center https://en.wikipedia.org/wiki/Cambridge_Scientific_Center

when it was decided to add virtual memory to all 370s, it was also
decided to rewrite CP67 for VM370, simplifying and/or dropping lots of
features (also renaming Cambridge Monitor System to Conversational
Monitor System and crippling its ability to run on real machine).

1974, I start migrating lots of original CP67 stuff (lots that I had
done as undergraduate) to VM370 Release2 base for an enhanced internal
CSC/VM (including for world-wide online sales&marketing support HONE
systems). Then in 1975 I upgrade to VM370 Release3 base and add the
CP67 multiprocessor support (one of the things dropped in CP67->VM370)
... originally for US consolidated HONE complex so they could add 2nd
processor to each of their systems (all the US HONE systems had been consolidated in Palo Alto, trivia: when FACEBOOK 1st moved into silicon
valley, it was into new bldg built next door to the former US
consolidated HONE datacenter).

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Wed Jun 12 22:26:48 2024

On Wed, 12 Jun 2024 11:54:55 -1000, Lynn Wheeler wrote:

when it was decided to add virtual memory to all 370s, it was also
decided to rewrite CP67 for VM370, simplifying and/or dropping lots of features (also renaming Cambridge Monitor System to Conversational
Monitor System and crippling its ability to run on real machine).

I recall CMS was single-user to start with, and the point of running it
under “CP” aka “VM” was to offer a multi-user service. Did CMS ever become
multi-user in its own right?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to EricP on Wed Jun 12 22:33:47 2024

On Wed, 12 Jun 2024 09:38:17 -0400, EricP wrote:

https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch04s08.html

That book is from 2002.

https://coral.googlesource.com/linux-imx/+/refs/heads/release-chef/arch/x86/entry/entry_32.S

That, too, seems a bit old. How about this for a more up-to-date
version: <https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_32.S>.
Or try the 64-bit version: <https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S>.

The problem I have with this approach is that it deals with all the
race conditions (eg a nested interrupt posts a new softirq between
when you checked for pending softirq's and the IRET) by running with interrupts disabled for long instruction sequences. I consider that
to be a poor way to do this as that blocks processing all other
interrupts.

But then again, things are complicated enough as it is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Scott Lurndal on Thu Jun 13 00:43:51 2024

Scott Lurndal wrote:

1) NMI are incredibly useful in certain cases, particularly for
in-kernel debuggers.
2) NMI is actually maskable on Intel hardware (in the chipset, not the processor)
3) ARM refused to support NMI in Aarch64 (partially because they didn't
have a spare exception vector). They've backtracked and hacked in
a
solution using the interrupt controller to create a
pseudo-unmaskable
interrupt due to customer demand.

On an architecture where one has multiple simultaneous interrupt tables

(say 1 per Guest OS and 1 per HyperVisor) and each table manages 32K
individual interrupts each interrupt mask by its corresponding Enable
bit::

Can one NOT infer that; a SW convention to leave at least 1 enable bit
always enabled, gives the system an NMI ??

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Thu Jun 13 00:34:34 2024

EricP wrote:

MitchAlsup1 wrote:

John Savard wrote:

After all, RowHammer is caused by multiple rapid-fire accesses to the
same address, or to related addresses, in memory.

Yes, the write buffer in my DRAM controller is the L3 cache. Modified
data in the L3 migrates towards DRAM as DRAM cycles permit, but there
is no way to cause a line to be continuously be written into DRAM.
If a modified line has migrated to DRAM, and it gets modified again
in the L3, that 2nd write will not be performed until a refresh cycle
on that DRAM is performed.

Thus if one tries to RowHammer My 66000 DRAM, DRAM gets refresh cycle
between each write.

What does it do if L3 receives more writes than it has ways in a row,
does it stall evicts from L2?

There is a 128 line temporally organized buffer between L3 and DRAM.
So with your proposed 4-way L3, you have 130 accesses between banging
on
one particular line a second time. This buffer is also SNOOPed acting
like a victim cache. And likewise a similar buffer on the read side,
acting like a prefetch buffer. There is a connection between the
buffers so that when ECC error is corrected, the corrected data is
migrated back into DRAM.

Lets say L3 is 4 way assoc and all four in a L3 row been updated,
then a 5th way in that same row is written from L2.

L3 dumps a selected line into the DRAM write staging buffer. This
staging buffer is controlled by high and low water marks, and
operated to minimize DRAM->CHIP and CHIP->DRAM bus turn arounds.
You try to insert refresh cycles on the DRAM while its pins are
doing their electrical turn around thing.

L3 has no place to hold that 5th way and it can't evict one
of the other 4 ways because that could cause rowhammer.

You typical L3 has a lot more ways than you postulate.

Seems to me that all it can do is stall the 5th write from L2 until
DRAM refresh rolls around and re-enables one of the pending L3 writes,
which would back up victim evicts from L2.
Or maybe L3 has a small fully assoc emergency overflow buffer,
but still that could fill up too.

Buffers my friend, buffers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Thu Jun 13 01:45:49 2024

On Wed, 12 Jun 2024 15:19:14 -1000, Lynn Wheeler wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

I recall CMS was single-user to start with, and the point of running it
under “CP” aka “VM” was to offer a multi-user service. Did CMS ever >> become multi-user in its own right?

over years relying more & more on CP kernel services, no multi-user ...
but did get multitasking ...

Interesting. This fits in with the idea that the “CP” in “CP/CMS” (later
“VM/CMS”) was invented purely/primarily in order to turn a single-user OS into a kind-of-multi-user OS.

trivia: my brother was regional Apple rep (largest physical area CONUS)
and when he came into town, I could be invited to business dinners and
argue MAC design (even before MAC announced).

So what did you think of it? The original hardware architecture was
heavily centred around the 60.15Hz video refresh. Each refresh interval,
21888 bytes were read out of the video buffer (for the 512×342 display),
and 740 bytes were read out of the sound buffer to go to the speaker.

The serial controller chip, a Zilog 8530, was remarkably flexible, too.
The third-party “MacRecorder” device involved reprogramming that to
receive digital sound data from the external hardware microphone/dongle
that plugged into the serial port, back before Macs had “official” sound input.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Lawrence D'Oliveiro on Wed Jun 12 15:19:14 2024

Lawrence D'Oliveiro <[email protected]d> writes:

I recall CMS was single-user to start with, and the point of running it
under “CP” aka “VM” was to offer a multi-user service. Did CMS ever become
multi-user in its own right?

over years relying more & more on CP kernel services, no multi-user
... but did get multitasking https://www.ibm.com/docs/en/zvm/7.3?topic=cms-application-multitasking https://www.ibm.com/docs/en/zvm/7.3?topic=programming-zvm-cms-application-multitasking
https://www.vm.ibm.com/pubs/redbooks/sg245164.pdf

original CMS that could run on real hardware support SIO and channel
programs for file i/o ... a CP "diagnose" function for CMS file i/o was
added to CP/67 that ran purely synchronous (didn't return to CMS until
file I/O was completed) ... in transition to VM370, CMS went purely for
CP "diagnose" (and SIO capability was eliminated).

When I joined science center and also saw the virtual memory file
support by MULTICS ... I figured I could do one for CMS ... that scaled
up faster than the normal file I/O operation ... and I claimed I learned
what not to do for a page-mapped filesystem from TSS/360 (part of
TSS/360 was just memory mapped the filesystem then mostly faulted in
pages ... while I did combination of memory mapping and pre-fetching, read-ahead and write-behind support).

Some of the IBM Future System issues was specifying a TSS/360-like
filesystem ... one of the last nails in the FS coffin was study that
showed if 370/195 applications were ported to FS machine made out of the fastest available hardware, it would have throughput of 370/145 (about
30 times slowdown ... part of it was serialization of file i/o).

Some existing FS descriptions talk about how FS lived on with S/38 ...
for entry-level business operation ... there was sufficient hardware performance provide necessary throughput for the s/38 market.

In any case, the FS implosion contributed to memory mapped filesystem implementations acquiring very bad reputation inside IBM. In 1980s, I
could show that heavily loaded, high-end systems with 3380 (3mbyte/sec
disks) running my page-mapped CMS filesystem had at least three times
the sustained throughput of standard CMS filesystems,

some FS
http://www.jfsowa.com/computer/memo125.htm https://people.computing.clemson.edu/~mark/fs.html

trivia: my brother was regional Apple rep (largest physical area CONUS)
and when he came into town, I could be invited to business dinners and
argue MAC design (even before MAC announced). He also figured out how to remotely dial into the S/38 that ran Apple to monitor manufacutring and delivery schedules.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Thu Jun 13 01:46:40 2024

On Thu, 13 Jun 2024 00:43:51 +0000, MitchAlsup1 wrote:

Can one NOT infer that; a SW convention to leave at least 1 enable bit
always enabled, gives the system an NMI ??

Every interrupt needs to be maskable at some point, if only to avoid
infinite recursion and resulting stack overflow.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Lawrence D'Oliveiro on Wed Jun 12 16:13:03 2024

Lawrence D'Oliveiro <[email protected]d> writes:

So what did you think of it? The original hardware architecture was
heavily centred around the 60.15Hz video refresh. Each refresh interval, 21888 bytes were read out of the video buffer (for the 512×342 display),
and 740 bytes were read out of the sound buffer to go to the speaker.

biggest issue was what I characterized as kitchen table "only" with no
business uses ... desktop publishing was somewhat inbetween (visicalc
wasn't supposedly part of it)... at a time when large corporations
ordering tens of thousands of IBM/PC with 3270 terminal emulation
... single desktop footprint doing both mainframe terminal and
increasing kinds of local processing.

later IBM co-worker left and did some work for Apple using Cray with 100mbyte/sec high-end graphics ... could be used to simulate various
processor and graphic performance ... part of the joke that Cray used
apple to design Cray machines and Apple used Cray machine to design
Apple machines.

some history
https://arstechnica.com/features/2005/12/total-share/ https://arstechnica.com/features/2005/12/total-share.ars/2 https://arstechnica.com/features/2005/12/total-share.ars/3 https://arstechnica.com/features/2005/12/total-share.ars/4 https://arstechnica.com/features/2005/12/total-share.ars/5 https://arstechnica.com/features/2005/12/total-share.ars/6 https://arstechnica.com/features/2005/12/total-share.ars/7 https://arstechnica.com/features/2005/12/total-share.ars/8 https://arstechnica.com/features/2005/12/total-share.ars/9 https://arstechnica.com/features/2005/12/total-share.ars/10

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Thu Jun 13 11:41:25 2024

On Thu, 13 Jun 2024 01:46:40 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Thu, 13 Jun 2024 00:43:51 +0000, MitchAlsup1 wrote:

Can one NOT infer that; a SW convention to leave at least 1 enable
bit always enabled, gives the system an NMI ??

Every interrupt needs to be maskable at some point, if only to avoid
infinite recursion and resulting stack overflow.

Edge-sensitive interrupt is effectively masked for as long as it is
latched.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Thu Jun 13 10:56:16 2024

Rowhammer can modify nearby lines, not just the ones that are being
hammered, right? How do you guarantee that all neighbors will also
be refreshed?

I don't know the answer to this one.

Similarly, if the accesses are LOCK XADD operations, and you have multiple CPUs (or cores not sharing a common last level cache, then I don't see any way to avoid those accesses from making it all the way to the RAM chips?

But I can answer that one: don't!
I.e. the DRAM is attached to one and only one CPU. Any other CPU that
wants to access that DRAM has to do it through that DRAM's CPU, which
will make it pass through its shared "last level" cache.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to [email protected] on Thu Jun 13 15:34:30 2024

[email protected] (MitchAlsup1) writes:

Scott Lurndal wrote:

1) NMI are incredibly useful in certain cases, particularly for
in-kernel debuggers.
2) NMI is actually maskable on Intel hardware (in the chipset, not the
processor)
3) ARM refused to support NMI in Aarch64 (partially because they didn't
have a spare exception vector). They've backtracked and hacked in
a
solution using the interrupt controller to create a
pseudo-unmaskable
interrupt due to customer demand.

On an architecture where one has multiple simultaneous interrupt tables

(say 1 per Guest OS and 1 per HyperVisor) and each table manages 32K >individual interrupts each interrupt mask by its corresponding Enable
bit::

Can one NOT infer that; a SW convention to leave at least 1 enable bit
always enabled, gives the system an NMI ??

CLI masks everything (except NMI) on x86 cores. Likewise PSTATE.I and PSTATE.F on Aarch64 cores.

On arm64, there are only two interrupts presented to the CPU (IRQ, FIQ).

Interrupt prioritization, assignment to CPU signal, and pending status is managed by the interrupt controller (GIC) which has routing tables, security assignments, priority and mask bits for each of five classes of interrupts (SGI, PPI, ePPI, SPI, eSPI and LPI). SGI has 16, PPI has 16, SPI has 950,
ePPI and eSPI extend the PPI and SPI ranges, and LPI support 24-bit interrupt numbers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Michael S on Thu Jun 13 11:35:55 2024

Michael S wrote:

On Thu, 13 Jun 2024 01:46:40 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Thu, 13 Jun 2024 00:43:51 +0000, MitchAlsup1 wrote:

Can one NOT infer that; a SW convention to leave at least 1 enable
bit always enabled, gives the system an NMI ??

Every interrupt needs to be maskable at some point, if only to avoid
infinite recursion and resulting stack overflow.

Edge-sensitive interrupt is effectively masked for as long as it is
latched.

The 8086 reset the internal NMI latch after it pushed the trap frame on
the stack (flags low, flags high, CS, IP) and jumped to the NMI vector,
so a subsequent rising edge triggered another NMI.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Thu Jun 13 15:36:16 2024

Michael S <[email protected]> writes:

On Thu, 13 Jun 2024 01:46:40 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Thu, 13 Jun 2024 00:43:51 +0000, MitchAlsup1 wrote:

Can one NOT infer that; a SW convention to leave at least 1 enable
bit always enabled, gives the system an NMI ??

Every interrupt needs to be maskable at some point, if only to avoid
infinite recursion and resulting stack overflow.

Edge-sensitive interrupt is effectively masked for as long as it is
latched.

Not necessarily. Subsequent edge assertions while the interrupt is pending will be coalesced such that there will be only one delivery.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Paul A. Clayton on Thu Jun 13 23:48:14 2024

Paul A. Clayton wrote:

On 6/8/24 1:37 PM, MitchAlsup1 wrote:

EricP wrote:

Scott Lurndal wrote:

[snip]

What they found that not only do they not need 4 levels,
it was a pointless overhead to have to constantly switch between them.
(There is a pretty high penalty to switching modes, copying in args,
validating args, doing something usually simple, then switching back,
when it is all the OS's code anyway.)

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

But for similar reasons ring 1 and 2 are not used in x86 machines,
either. {{NOw, if we could just go back to 1982 and not invent
IDTs, and call gates, .....}}

I thought My 66000 had Port Holes that are vaguely similar to
call gates, so rather than "not invent" perhaps invent with better
semantics and a better interface?

I would place them congruent to Load-From and Store-TO PDP-11/70
instructions.

I have since converted to a more Linux friendly MMU structure.
Port Holes can be easily resurrected.

(Though 1982 might have been too
early to implement such. Better perceiving when to wait for the
technology or understanding to implement something better is
presumably one of the skills acquired by long experience as well
as the related what can be implemented to provide the most attractive/marketable features without excessively limiting future developments.

Letting a competitor provide a temporarily better
product — or delaying entry into a market expecting a feature —
can sometimes be sensible if one expects to leapfrog with
a better long-term alternative, but "worse is better" has some
truth.)

It seems that in terms of computer architectures, the world is
not going to beat a path to your door even if you invent a
better mousetrap.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Fri Jun 14 00:36:20 2024

On Thu, 13 Jun 2024 23:48:14 +0000, MitchAlsup1 wrote:

It seems that in terms of computer architectures, the world is not going
to beat a path to your door even if you invent a better mousetrap.

There is an inherent conflict between wanting an idea to be widely
adopted, and wanting to maximize your profit from it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to [email protected] on Fri Jun 14 02:57:46 2024

On Fri, 14 Jun 2024 00:36:20 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

On Thu, 13 Jun 2024 23:48:14 +0000, MitchAlsup1 wrote:

It seems that in terms of computer architectures, the world is not going
to beat a path to your door even if you invent a better mousetrap.

There is an inherent conflict between wanting an idea to be widely
adopted, and wanting to maximize your profit from it.

But the failure of RISC-B to make x86 obsolete shows that even giving
it away for free is not enough. Because not being able to run your old
Windows programs is the real problem.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to All on Fri Jun 14 02:55:49 2024

On Thu, 13 Jun 2024 23:48:14 +0000, [email protected] (MitchAlsup1)
wrote:

It seems that in terms of computer architectures, the world is
not going to beat a path to your door even if you invent a
better mousetrap.

And we all know the main reason for that. If you already have a
computer that you have bought software for, when you upgrade your
computer, you want to be able to move all that software over to your
new computer without additional costs or issues.
That means the new computer must have the same ISA and run the same
operating system, or that operating system's fully upwards-compatible successor.
That means that x86 is king for the foreseeable future.
But _some_ people use Linux, which essentially makes them free to hop
to any ISA for which the Gnu C compiler works.
The other option, of course, is a new niche - and so while the desktop
is Windows on the x86, smartphones are Android on ARM.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to EricP on Fri Jun 14 16:41:24 2024

EricP <[email protected]> writes:

Lawrence D'Oliveiro wrote:

On Wed, 12 Jun 2024 09:38:17 -0400, EricP wrote:

https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch04s08.html

That book is from 2002.

https://coral.googlesource.com/linux-imx/+/refs/heads/release-chef/arch/x86/entry/entry_32.S

That, too, seems a bit old. How about this for a more up-to-date
version:
<https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_32.S>. >> Or try the 64-bit version:
<https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S>.

Thanks, I'll have a look that entry.s. It looks quite different.
The copyright on common.c file I referenced was 2015 so those
files seemed to be relatively up to date and being maintained.

The problem I have with this approach is that it deals with all the
race conditions (eg a nested interrupt posts a new softirq between
when you checked for pending softirq's and the IRET) by running with
interrupts disabled for long instruction sequences. I consider that
to be a poor way to do this as that blocks processing all other
interrupts.

But then again, things are complicated enough as it is.

The cautionary tail here is that return code path is complicated
exactly because it wasn't sorted out during the ISA and HW design phase.

Or perhaps, the cautionary tale is that a 1970 architecture
must adapt to new paradigms over five decades, and backward
compatability requirements lead to inevitable complexity.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Lawrence D'Oliveiro on Fri Jun 14 12:24:42 2024

Lawrence D'Oliveiro wrote:

On Wed, 12 Jun 2024 09:38:17 -0400, EricP wrote:

https://www.oreilly.com/library/view/understanding-the-linux/0596002130/ch04s08.html

That book is from 2002.

https://coral.googlesource.com/linux-imx/+/refs/heads/release-chef/arch/x86/entry/entry_32.S

That, too, seems a bit old. How about this for a more up-to-date
version: <https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_32.S>.
Or try the 64-bit version: <https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S>.

Thanks, I'll have a look that entry.s. It looks quite different.
The copyright on common.c file I referenced was 2015 so those
files seemed to be relatively up to date and being maintained.

The problem I have with this approach is that it deals with all the
race conditions (eg a nested interrupt posts a new softirq between
when you checked for pending softirq's and the IRET) by running with
interrupts disabled for long instruction sequences. I consider that
to be a poor way to do this as that blocks processing all other
interrupts.

But then again, things are complicated enough as it is.

The cautionary tail here is that return code path is complicated
exactly because it wasn't sorted out during the ISA and HW design phase.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Fri Jun 14 21:56:19 2024

On Fri, 14 Jun 2024 16:41:24 GMT, Scott Lurndal wrote:

Or perhaps, the cautionary tale is that a 1970 architecture must adapt
to new paradigms over five decades, and backward compatability
requirements lead to inevitable complexity.

With Linux, it’s easy enough to compare the corresponding code in the
source subdirectories specific to the other architectures it supports.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Savard on Fri Jun 14 21:59:29 2024

On Fri, 14 Jun 2024 02:57:46 -0600, John Savard wrote:

But the failure of RISC-B to make x86 obsolete shows that even giving it
away for free is not enough. Because not being able to run your old
Windows programs is the real problem.

You mean RISC-V?

I think it is succeeding in its goals. From what I hear, it’s already shipping in the billions of units per year, in a similar league to ARM.

Compare this to x86, which at its peak was shipping about 360 million
units per year (a million a day), but is now down to about 280 million,
and continues to stagnate. Sure, there’s a lot more money to be made from those hundreds of millions of x86 chips than from those billions of RISC-V
and ARM chips.

Moral: the desktop is not the centre of the computing universe. It is only
a small part of it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Fri Jun 14 22:14:21 2024

On Fri, 14 Jun 2024 22:10:23 -0000 (UTC), Thomas Koenig wrote:

John Savard <[email protected]d> schrieb:

But _some_ people use Linux, which essentially makes them free to hop
to any ISA for which the Gnu C compiler works.

It's not quite that simple - if you try to build a modern web brower for POWER on Linux, for example, you're in for quite an adventure.

Endianness assumptions? I think essentially all of the basic toolchain is already available, so what’s left would be mostly bugs in the app code itself. For which I’m sure they would accept patches.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Fri Jun 14 22:10:23 2024

John Savard <[email protected]d> schrieb:

But _some_ people use Linux, which essentially makes them free to hop
to any ISA for which the Gnu C compiler works.

It's not quite that simple - if you try to build a modern web brower
for POWER on Linux, for example, you're in for quite an adventure.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Sat Jun 15 07:43:09 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Fri, 14 Jun 2024 22:10:23 -0000 (UTC), Thomas Koenig wrote:

It's not quite that simple - if you try to build a modern web brower for
POWER on Linux, for example, you're in for quite an adventure.

Endianness assumptions?

OpenPower is little-endian, so I doubt that this is the reason. From
what I read, Web browsers are a beast to build.

I think essentially all of the basic toolchain is
already available, so what’s left would be mostly bugs in the app code >itself.

There's a huge difference between what application maintainers
consider bugs in application code and what the C and C++ compiler
maintainers do.

For which I’m sure they would accept patches.

Who would write them? I have posed a challenge to advocates of
undefined behaviour as the way to efficiency in <[email protected]>:

|Write a proof-of-concept Forth interpreter in the language you
|advocate that runs at least one of bubble-sort, matrix-mult or sieve
|from bench/forth in
|<http://www.complang.tuwien.ac.at/forth/bench.zip>

Nobody has risen to the challenge, much less submitted patches to
convert Gforth to the kind of C code that gcc officially supports.

Fortunately, the practice is quite a bit better than what the
advocates threaten, so Gforth builds nicely on RISC-V, and, last I
tried it, on Power.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to D'Oliveiro on Sat Jun 15 08:28:00 2024

In article <v4ieg1$32kuq$[email protected]>, [email protected]d (Lawrence
D'Oliveiro) wrote:

You mean RISC-V?

I think it is succeeding in its goals. From what I hear, it's
already shipping in the billions of units per year, in a similar
league to ARM.

As an open-source project, it doesn't seem to have "goals" in quite the
same way as a project controlled by a single organisation, or a small
group of them.

It's competing effectively with ARM in the embedded world. For mobile,
desktop and datacentre, things have not progressed that far. When SiFive abandoned development of high-powered general-purpose CPUs, the push into
those spaces faltered. I'm sorry about that: I was looking forward to
having another architecture to learn and support.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Anton Ertl on Sat Jun 15 09:59:15 2024

Anton Ertl <[email protected]> schrieb:

Lawrence D'Oliveiro <[email protected]d> writes:

On Fri, 14 Jun 2024 22:10:23 -0000 (UTC), Thomas Koenig wrote:

It's not quite that simple - if you try to build a modern web brower for >>> POWER on Linux, for example, you're in for quite an adventure.

Endianness assumptions?

OpenPower is little-endian, so I doubt that this is the reason. From
what I read, Web browsers are a beast to build.

So they are, especially the build times.

But for an unsupported architecture: If you want to have an
idea what needed to be done for Chrome at one time, look at https://github.com/shawnanastasio/chromium_power It's a lot of
configuration stuff, but also some code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to D'Oliveiro on Sat Jun 15 12:16:00 2024

In article <v4ifbt$32kuq$[email protected]>, [email protected]d (Lawrence
D'Oliveiro) wrote:

On Fri, 14 Jun 2024 22:10:23 -0000 (UTC), Thomas Koenig wrote:

It's not quite that simple - if you try to build a modern web
brower for POWER on Linux, for example, you're in for quite
an adventure.

Endianness assumptions? I think essentially all of the basic
toolchain is already available, so what's left would be mostly bugs
in the app code itself. For which I'm sure they would accept
patches.

There are a _lot_ of libraries and other components that go into a modern
web browser, many of which will never have been built on POWER. The JITer
for the Javascript engine, and the Web Assembly translator seem to be
among them, and they need to make use of the native instruction set.
That's not a bug fix, that's a significant implementation task.

As modern web browser does a _lot_ more than interpret HTML and display bitmaps, and most of the code for the extra functionality is in the
browser. They're more like multimedia operating systems than document
viewers.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Dallman on Sat Jun 15 22:33:03 2024

On Sat, 15 Jun 2024 12:16 +0100 (BST), John Dallman wrote:

There are a _lot_ of libraries and other components that go into a
modern web browser, many of which will never have been built on POWER.
The JITer for the Javascript engine, and the Web Assembly translator
seem to be among them, and they need to make use of the native
instruction set. That's not a bug fix, that's a significant
implementation task.

But those are not required for correctness, only for efficiency. The
original question, as I understood it, was to get the code running on the specified architecture, not necessarily to get it running at top speed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Sat Jun 15 22:31:14 2024

On Sat, 15 Jun 2024 07:43:09 GMT, Anton Ertl wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

For which I’m sure they would accept patches.

Who would write them?

Whoever cared.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 01:55:26 2024

On Sat, 15 Jun 2024 22:33:03 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sat, 15 Jun 2024 12:16 +0100 (BST), John Dallman wrote:

There are a _lot_ of libraries and other components that go into a
modern web browser, many of which will never have been built on
POWER. The JITer for the Javascript engine, and the Web Assembly
translator seem to be among them, and they need to make use of the
native instruction set. That's not a bug fix, that's a significant implementation task.

But those are not required for correctness, only for efficiency. The
original question, as I understood it, was to get the code running on
the specified architecture, not necessarily to get it running at top
speed.

Do you use video codecs in FF for correctness or only for efficiency?
;-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jun 16 02:52:02 2024

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for efficiency?

I think FFmpeg is one of those basic toolkits that has already been ported
to OpenPOWER.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Michael S on Sun Jun 16 08:00:02 2024

Michael S <[email protected]> schrieb:

On Sun, 16 Jun 2024 02:52:02 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for
efficiency?

I think FFmpeg is one of those basic toolkits that has already been
ported to OpenPOWER.

Is it capable to decode H264?

https://ffmpeg.org/ffmpeg-codecs.html says yes, if you use http://www.openh264.org/ .

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 10:34:47 2024

On Sun, 16 Jun 2024 02:52:02 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for
efficiency?

I think FFmpeg is one of those basic toolkits that has already been
ported to OpenPOWER.

Is it capable to decode H264?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jun 16 08:49:42 2024

On Sun, 16 Jun 2024 10:34:47 +0300, Michael S wrote:

On Sun, 16 Jun 2024 02:52:02 -0000 (UTC) Lawrence D'Oliveiro
<[email protected]d> wrote:

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for efficiency?

I think FFmpeg is one of those basic toolkits that has already been
ported to OpenPOWER.

Is it capable to decode H264?

I’m surprised you didn’t know, since you were the one who mentioned it.

It has options to build against toolkits for every codec and file format
that is still worth using these days, and a few more besides.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Thomas Koenig on Sun Jun 16 12:51:42 2024

On Sun, 16 Jun 2024 08:00:02 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 16 Jun 2024 02:52:02 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for
efficiency?

I think FFmpeg is one of those basic toolkits that has already been
ported to OpenPOWER.

Is it capable to decode H264?

https://ffmpeg.org/ffmpeg-codecs.html says yes, if you use http://www.openh264.org/ .

Thank you.
I see that ppc64el is not supported, but verified to be working.
Hopefully it means that it's not just shows something under FHD
resolution, but can work without dropping frames. Which is not so easy
when done purely in software.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 12:43:40 2024

On Sun, 16 Jun 2024 08:49:42 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 10:34:47 +0300, Michael S wrote:

On Sun, 16 Jun 2024 02:52:02 -0000 (UTC) Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for
efficiency?

I think FFmpeg is one of those basic toolkits that has already been
ported to OpenPOWER.

Is it capable to decode H264?

I’m surprised you didn’t know, since you were the one who mentioned
it.

It has options to build against toolkits for every codec and file
format that is still worth using these days, and a few more besides.

All I know about it is that typical FF installation on x86-64 uses plug
in provided by Cisco. I have no idea if the reason for it is technical
or legal.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jun 16 09:56:29 2024

On Sun, 16 Jun 2024 12:43:40 +0300, Michael S wrote:

On Sun, 16 Jun 2024 08:49:42 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 10:34:47 +0300, Michael S wrote:

Is it capable to decode H264?

I’m surprised you didn’t know, since you were the one who mentioned it. >>
It has options to build against toolkits for every codec and file
format that is still worth using these days, and a few more besides.

All I know about it is that typical FF installation on x86-64 uses plug
in provided by Cisco. I have no idea if the reason for it is technical
or legal.

Or a Windows thing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Torbjorn Lindgren@21:1/5 to [email protected] on Sun Jun 16 12:47:40 2024

Michael S <[email protected]> wrote:

On Sun, 16 Jun 2024 08:49:42 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

It has options to build against toolkits for every codec and file
format that is still worth using these days, and a few more besides.

All I know about it is that typical FF installation on x86-64 uses plug
in provided by Cisco. I have no idea if the reason for it is technical
or legal.

I believe the reason is patents... Which is another way of saying it
is/was for legal reasons.

h.264 is/was extremely heavily patented, the MPEGLA patent consortium
list for what patent they include in their license is 58 pages[1],
with three colums! A lot of that is due to national patent duplication
but still, Firefox just says "More than 1000 matches" when I search
for just US patents!

Now, a lot of the US patent expired during 2023 but did ALL? And what
about all the patents in other countries. And also remember that this
decision was made long ago when alll these patents were presumed to be
valid (including by courts) and there were real grumblings about
various people suing other people under these patents.

So when Cisco open-sourced their h.264 implementation under a BSD
2-clause license back in 2014? and let Firefox use it while being
covered by Cisco's MPEGLA agreement (unknown date)?!

Well, since no one else stepped up (and there was calls since, well,
Cisco) it was the *only* way Firefox could safely ship binaries that
could do h.264 out of the box. So lots of hand wringing and teeth
gnashing but... no choice really if they wanted to survive.

IIRC before (and after) that there was various workarounds with
external modules (say the user providing FFmpeg) or just "use the
system video player" (possibly playing in a different window!) but
none of them was good solutions.

1. https://www.mpegla.com/wp-content/uploads/avc-att1.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Robert Swindells@21:1/5 to John Dallman on Sun Jun 16 16:58:58 2024

On Sat, 15 Jun 2024 12:16 +0100 (BST), John Dallman wrote:

In article <v4ifbt$32kuq$[email protected]>, [email protected]d (Lawrence D'Oliveiro) wrote:

On Fri, 14 Jun 2024 22:10:23 -0000 (UTC), Thomas Koenig wrote:

It's not quite that simple - if you try to build a modern web brower
for POWER on Linux, for example, you're in for quite an adventure.

Endianness assumptions? I think essentially all of the basic toolchain
is already available, so what's left would be mostly bugs in the app
code itself. For which I'm sure they would accept patches.

There are a _lot_ of libraries and other components that go into a
modern web browser, many of which will never have been built on POWER.
The JITer for the Javascript engine, and the Web Assembly translator
seem to be among them, and they need to make use of the native
instruction set. That's not a bug fix, that's a significant
implementation task.

As modern web browser does a _lot_ more than interpret HTML and display bitmaps, and most of the code for the extra functionality is in the
browser. They're more like multimedia operating systems than document viewers.

Even the code that interprets HTML can be difficult to port to a less
common architecture, or even keep it working on that architecture.

In Firefox, a lot of it is written in Rust which is still changing
rapidly.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Michael S on Sun Jun 16 21:23:45 2024

Michael S wrote:

On Sun, 16 Jun 2024 08:00:02 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 16 Jun 2024 02:52:02 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 16 Jun 2024 01:55:26 +0300, Michael S wrote:

Do you use video codecs in FF for correctness or only for
efficiency?

I think FFmpeg is one of those basic toolkits that has already been
ported to OpenPOWER.

Is it capable to decode H264?

https://ffmpeg.org/ffmpeg-codecs.html says yes, if you use
http://www.openh264.org/ .

Thank you.
I see that ppc64el is not supported, but verified to be working.
Hopefully it means that it's not just shows something under FHD
resolution, but can work without dropping frames. Which is not so easy
when done purely in software.

Thanks, I know!

On the very first 4-core Intel CPUs their own reference codec only
managed 30 frames/second if everything was maxed out, i.e. CABAC
encoding, 60 frames/second, 1080p resolution, 40 Mbit/s bitrate.

(This was while running all 4 cores at 100%)

They paid me very well to show them how they could in fact double this,
but instead of also paying me to actually implement the code for them,
they licensed a chunk of VLSI to do it in HW. (Which I think was the
right thing to do.)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Torbjorn Lindgren on Mon Jun 17 00:10:38 2024

On Sun, 16 Jun 2024 12:47:40 -0000 (UTC), Torbjorn Lindgren wrote:

h.264 is/was extremely heavily patented, the MPEGLA patent consortium
list for what patent they include in their license is 58 pages[1], with
three colums!

But here’s the fun thing: a proprietary OS like Windows can include MPEG-4 H.264 playback for free, but if you want to play older-format DVD-Video (MPEG-2), that’s an extra cost.

Or it was, unless those patents have all expired by now.

H.265 is even worse. MPEG-LA has its own patent pool for that, but I think there is an entirely separate group also claiming “intellectual property” rights on aspects of that.

This is why Google and others are promoting AV1. That’s meant to be comparable to H.265 in performance and quality, but completely patent- unencumbered.

Which has not stopped some greedy groups from claiming they’re bound to
have some patents somewhere that are likely to apply. Details not publicly available, of course.

Now you know why the FFmpeg project is based in Hungary. Funny thing about
US patents, is that they only apply in the US.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Paul A. Clayton on Tue Oct 22 21:08:24 2024

On Mon, 21 Oct 2024 0:42:32 +0000, Paul A. Clayton wrote:

THREAD NECROMANCY

On 6/11/24 5:18 PM, MitchAlsup1 wrote:
[snip]

I doubt that RowHammer still works when refreshes are interspersed
between accesses--RowHammer generally works because the events are
not protected by refreshes--the DRC sees the right ROW open and
simple streams at the open bank.

If one refreshes the two adjacent rows to avoid data disruption,
those refreshes would be adjacent reads to two other rows so it
seems one would have to be a little cautious about excessively
frequent refreshes.

Also note, there are no instructions in My 66000 that force a cache
to DRAM whereas there are instructions that can force a cache line
into L3.

How does a system suspend to DRAM if it cannot force a writeback
of all dirty lines to memory?

In GENERAL, you do not want to give this capability to applications
nor use it willy-nilly.

I am *guessing* this would not use a
special instruction but rather configuration of power management
that would cause hardware/firmware to clean the cache.

There is a sideband command from any master (anywhere) that causes
L3 to get dumped to DRAM over the next refresh interval. It is not
an instruction, and the TLB has to cooperate. A device may initiate
"suspend to DRAM" as well as a CPU (or any other bus master).

Writing back specific data to persistent memory might also
motivate cache block cleaning operations. Perhaps one could
implement such by copying from a cacheable mapping to a
non-cacheable(I/O?) memory?? (I simply remember that Intel added
instructions to write cache lines to persistent memory.)

L3 is the buffer to DRAM. Nothing gets to DRAM without
going through L3 and nothing comes out of DRM that is not also
buffer by L3. So, if 96 cores simultaneously read a line residing in
DRAM, DRAM is read once and 95 cores are serviced through L3. So,
you can't RowHammer based on reading DRAM, either.

If 128 cores read distinct cache lines from the same page quickly
enough to hammer the adjacent pages but not quickly enough to get
DRAM page open hits, this would seem to require relatively
frequent refreshes of adjacent DRAM rows.

DDR 5 has a 64 GB/s transfer rate
128 cache lines (64B) is 8192 bytes
So this takes 1/8 of a millisecond or 125µs.
A DDR5 refresh interval is 3.9µs.

https://www.micron.com/content/dam/micron/global/public/products/white-paper/ddr5-new-features-white-paper.pdf#:~:text=REFRESH%20commands%20are%20issued%20at%20an%20average%20periodic,of%20295ns%20for%20a%2016Gb%20DDR5%20SDRAM%20device.

So one has refreshes in the described situation.

Since the L3/memory controller could see that the DRAM row was
unusually active, it could increase prefetching while the DRAM
row was open and/or queue the accesses longer so that the
hammering frequency was reduced and page open hits would be more
common.

A DRAM Row stays active, commands just CAS-out more data. That is
there is no ROW Hammering--the word line remains asserted while
the sense amplifiers remain asserted with captured data--while
CASs are used to strobe out ore data {subject to refresh}.

The simple statement that L3 would avoid RowHammer by providing
the same cache line to all requesters seemed a bit too simple.

You need to investigate the difference between RAS and CAS for
DRAMs.

Your design may very well handle all the problematic cases,
perhaps even with minimal performance penalties for inadvertent
hammering and logging/notification for questionable activity just
like for error correction (and has been proposed for detected race conditions). I just know that these are hard problems.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Wed Oct 23 22:38:40 2024

On Sun, 9 Jun 2024 2:23:35 +0000, Lawrence D'Oliveiro wrote:

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, theoretical concept rather than a practical one.

The question was: was the instruction set defined so that code that was designed to run in a privileged mode be run unprivileged, so that any
attempt to do privileged things would be trapped and emulated by the
real privileged code? And there was nothing it could do to discover
it wasn’t running in privileged mode?

My 66000 ISA has this property, and it is used when hypervisors host hypervisors.

On the other hand, there is only 1 privileged instruction which
provides access to 4 separate control register spaces based on
current Core-Stack level.

(Obviously performance was not the issue here, but correctness was.)

For example, the VAX had a MOVPSL instruction that allowed read-only
access to the entire processor status register. Through this,
nonprivileged user-mode code could discover it was running in user mode, which would blow the illusion.

While illustrative, we have entered the realm where processor state
is closer to a cache line in size than a register in size. And the
processor (core) stack of software layers is closer to 4 cache lines
in size.

The Motorola 680x0 family was I think properly virtualizable in this
sense. Or maybe the 68020 and 68030 were, but the 68040 was. I think the Motorola engineers working on the ’040 asked if any customers were interested in preserving the self-virtualization feature, and nobody
seemed to care.

During 020 development and testing, there was a mode whereby each
instruction executed raised every possible exception--this only found
99% of the virtualization problems.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	47:29:50
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,113

Privilege Levels Below User

Who's Online

Recent Visitors

System Info