• Re: cramming 24 bits of address into 16 bits, was old and slow base and

    From John Levine@21:1/5 to All on Fri Jun 20 18:40:24 2025
    According to quadibloc <[email protected]>:
    I disagree with his solution, because index registers are for
    displacing from an address; base registers build an address.
    Locations in memory should be able to be addressed in a static
    manner.

    That makes sense now, hot wo much in 1963.

    S/360 architecture had two incompatible goals. It had a 24 bit flat address space, but all of the actual machines were a lot smaller than that. Memory was expensive, so they needed compact address encoding. Putting full 24 bit addresses in the instructions would have made them too large and wasted a lot of
    space. The largest machine in the first set of 360s was only 512K, 19 address bits.

    So they needed a way to put partial addresses in the instructions. Considering the options, I think they did pretty well. The other possibilities would have been some kind of bank switching (which they did on the CTSS 7094) or local/page0 (PDP-5) or segments, all of which would have made programming a lot more painful. The scheme they chose meant that you set up a bunch of registers to point to your code and various chunks of data, and then efficiently used them
    as base registers to do your work, since it was rare for a single I/O buffer or data structure to be bigger than 4K. Arrays might be bigger than that, but that's what index registers are for.

    I don't think that replacing the multiple base registers with a single implicit base register as Stephen Fuld has suggested would have been practical. Programs all used multiple base registers to address data. If you turned those into index
    registers, now most of your data accesses will in effect be double indexed (implicit base plus explicit index) and the extra trip through the adder was slow on small machines.

    One thing I do think they could have done better was branch addressing. Rather than B+D like they did for data, make it a 16 bit signed offset, shifted one bit
    left since instructions were halfword aligned. That meant you could jump up to 64K in either direction without a base register for code. They sort of faked it since the usual start to a routine was BALR R,0 to put the PC in register R to use as a base register, but they added that kind of relative branch later in S/390. The relative branch is still a single addition, same as B+D, so I would think the performance on small machines would be similar.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to EricP on Fri Jun 20 21:30:23 2025
    EricP <[email protected]> schrieb:

    A 48-bit instruction would have bits: 8 opcode, 4 function code,
    3 registers: rSrcDst, rBase, rIndex, and a 24-bit offset.
    The function code field is so we don't use up all the opcodes.

    A large part of the elegance of the /360 instruction set was its
    single-byte opcode. This led to some quirks, but it is a good
    excercise to try and keep that (without taking up too much of it,
    of course :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Fri Jun 20 17:15:56 2025
    John Levine wrote:
    According to quadibloc <[email protected]>:
    I disagree with his solution, because index registers are for
    displacing from an address; base registers build an address.
    Locations in memory should be able to be addressed in a static
    manner.

    That makes sense now, hot wo much in 1963.

    S/360 architecture had two incompatible goals. It had a 24 bit flat address space, but all of the actual machines were a lot smaller than that. Memory was
    expensive, so they needed compact address encoding. Putting full 24 bit addresses in the instructions would have made them too large and wasted a lot of
    space. The largest machine in the first set of 360s was only 512K, 19 address bits.

    It makes programs too large if 24-bit addresses is the _only_ option.
    I think it is a strong requirement that at least some instructions
    can easiley address the whole 24-bit address space.
    I wouldn't want every access to have to "work around" the limitations.
    Then add other compact formats later.

    For example, a maximal address mode would be [rBase+rIndex+imm24].
    If rBase = 0 then the PC is the base address.
    If rIndex = 0 then index offset = 0, otherwise rIndex << operand size.

    That gives Reg-rel and PC-rel access to the whole address space.
    And for that address mode it scales the index by the operand size
    so it is useful for array indexing (there are no spare bits for scaling).

    A 48-bit instruction would have bits: 8 opcode, 4 function code,
    3 registers: rSrcDst, rBase, rIndex, and a 24-bit offset.
    The function code field is so we don't use up all the opcodes.

    An initial cut at the instructions for function code might be:

    LDW Load word
    LDH Load halfword
    STW Store word
    STH Store halfword

    Integer operations of the form rsd = rsd OP [mem]

    ADD, ADC (add with carry), SUB, SBB (subtract with borrow), MUL, DIV
    AND, OR, XOR

    And in a different opcode/function code

    FLS Float load single
    FLD Float load double
    FSS Float store single
    FSD Float store double

    And operate instructions on single and double for
    FADD, FSUB, FCMP, FMUL, FDIV, etc.

    So they needed a way to put partial addresses in the instructions. Considering the options, I think they did pretty well.

    I agree they also need smaller than the maximal address mode above.
    But those are optimizations to be used an address fits into
    a compact format. Something like:

    8 opcode, 4 function code, rSrcDst, rBase, imm12

    drops the rIndex register and the offset size, and fits into 32-bits.
    Function codes would be same as before.

    The other possibilities would have
    been some kind of bank switching (which they did on the CTSS 7094) or local/page0 (PDP-5) or segments, all of which would have made programming a lot
    more painful. The scheme they chose meant that you set up a bunch of registers
    to point to your code and various chunks of data, and then efficiently used them
    as base registers to do your work, since it was rare for a single I/O buffer or
    data structure to be bigger than 4K. Arrays might be bigger than that, but that's what index registers are for.

    I don't think that replacing the multiple base registers with a single implicit
    base register as Stephen Fuld has suggested would have been practical. Programs
    all used multiple base registers to address data. If you turned those into index
    registers, now most of your data accesses will in effect be double indexed (implicit base plus explicit index) and the extra trip through the adder was slow on small machines.

    Yes, but many can use the more compact formats.

    One thing I do think they could have done better was branch addressing. Rather
    than B+D like they did for data, make it a 16 bit signed offset, shifted one bit
    left since instructions were halfword aligned. That meant you could jump up to
    64K in either direction without a base register for code. They sort of faked it
    since the usual start to a routine was BALR R,0 to put the PC in register R to
    use as a base register, but they added that kind of relative branch later in S/390. The relative branch is still a single addition, same as B+D, so I would
    think the performance on small machines would be similar.

    I would start with a maximal address forms that can reach the whole
    address space, BR and BAL (rLink = R31) with 24-bit offsets,
    and Bcc branch conditional with 16-bit offset, in a 32-bit instruction.
    Then add other branch/jump formats.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Jun 21 17:21:30 2025
    According to EricP <[email protected]>:
    That makes sense now, not wo much in 1963.

    S/360 architecture had two incompatible goals. It had a 24 bit flat address >> space, but all of the actual machines were a lot smaller than that. Memory was
    expensive, so they needed compact address encoding. Putting full 24 bit
    addresses in the instructions would have made them too large and wasted a lot of
    space. The largest machine in the first set of 360s was only 512K, 19 address
    bits.

    It makes programs too large if 24-bit addresses is the _only_ option.
    I think it is a strong requirement that at least some instructions
    can easiley address the whole 24-bit address space.
    I wouldn't want every access to have to "work around" the limitations.
    Then add other compact formats later. ...

    IBM sort of agrees with you, since the added instructions like that to S/360 and
    zSeries, but it was 30 years later.

    The 360 was built from SLT, a few individual transistors mounted on a carrier, and hand-strung core memory. The /30 had 8 bit data paths and kept most of the 360's state including all of the general registers in core, since logic was so expensive. The microengine had 4032 60-bit words of capacitor ROM which had to implement the entire instruction set, the channels, the 1401 emulator, and some diagnostice.

    The question wasn't what features would be nice to have, it was what would provide enough extra performance to be worth the extra cost. Having done a little 360 programming I can tell you that the lack of direct addressing was not
    a big deal. If you needed to reference something that wasn't covered by a base register, you loaded a pointer from memory into a register and used that as another base register. That didn't happen very often, at least if you wrote in assembler or used a decent compiler like Fortran H.


    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Sat Jun 21 15:46:28 2025
    John Levine wrote:
    According to EricP <[email protected]>:
    That makes sense now, not wo much in 1963.

    S/360 architecture had two incompatible goals. It had a 24 bit flat address >>> space, but all of the actual machines were a lot smaller than that. Memory was
    expensive, so they needed compact address encoding. Putting full 24 bit
    addresses in the instructions would have made them too large and wasted a lot of
    space. The largest machine in the first set of 360s was only 512K, 19 address
    bits.
    It makes programs too large if 24-bit addresses is the _only_ option.
    I think it is a strong requirement that at least some instructions
    can easiley address the whole 24-bit address space.
    I wouldn't want every access to have to "work around" the limitations.
    Then add other compact formats later. ...

    IBM sort of agrees with you, since the added instructions like that to S/360 and
    zSeries, but it was 30 years later.

    The 360 was built from SLT, a few individual transistors mounted on a carrier,
    and hand-strung core memory. The /30 had 8 bit data paths and kept most of the
    360's state including all of the general registers in core, since logic was so
    expensive. The microengine had 4032 60-bit words of capacitor ROM which had to
    implement the entire instruction set, the channels, the 1401 emulator, and some
    diagnostice.

    The question wasn't what features would be nice to have, it was what would provide enough extra performance to be worth the extra cost. Having done a little 360 programming I can tell you that the lack of direct addressing was not
    a big deal. If you needed to reference something that wasn't covered by a base
    register, you loaded a pointer from memory into a register and used that as another base register. That didn't happen very often, at least if you wrote in
    assembler or used a decent compiler like Fortran H.

    I hadn't noticed in scanning the -30 Field Engineering manual that
    they put the integer and float "registers" in core but yes,
    they are 256 bytes in the Auxiliary Storage area with 256 bytes of
    Multiplexer data. Auxiliary Storage is in the 1.5 or 2.0 us access,
    8-bit wide with parity main core memory unit but with its own physical
    address range.

    So just reading an integer or float "register" takes 4 core memory cycles.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Levine on Sat Jun 21 22:42:37 2025
    On 6/20/2025 11:40 AM, John Levine wrote:
    According to quadibloc <[email protected]>:
    I disagree with his solution, because index registers are for
    displacing from an address; base registers build an address.
    Locations in memory should be able to be addressed in a static
    manner.

    That makes sense now, hot wo much in 1963.

    S/360 architecture had two incompatible goals. It had a 24 bit flat address space, but all of the actual machines were a lot smaller than that. Memory was
    expensive, so they needed compact address encoding. Putting full 24 bit addresses in the instructions would have made them too large and wasted a lot of
    space. The largest machine in the first set of 360s was only 512K, 19 address bits.

    So they needed a way to put partial addresses in the instructions. Considering
    the options, I think they did pretty well. The other possibilities would have been some kind of bank switching (which they did on the CTSS 7094) or local/page0 (PDP-5) or segments, all of which would have made programming a lot
    more painful. The scheme they chose meant that you set up a bunch of registers
    to point to your code and various chunks of data, and then efficiently used them
    as base registers to do your work, since it was rare for a single I/O buffer or
    data structure to be bigger than 4K. Arrays might be bigger than that, but that's what index registers are for.

    I don't think that replacing the multiple base registers with a single implicit
    base register as Stephen Fuld has suggested would have been practical. Programs
    all used multiple base registers to address data. If you turned those into index
    registers, now most of your data accesses will in effect be double indexed (implicit base plus explicit index) and the extra trip through the adder was slow on small machines.

    That is a good point. I need to think about it some more. If the
    "double indexing of RS and SI instructions added more overhead than was
    saved by, for example, the 16 bit displacement in RX instructions, and
    the instructions to keep reloading the base registers, that would be a
    big, probably fatal, drawback.


    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Stephen Fuld on Tue Jun 24 10:58:49 2025
    On 6/21/2025 10:42 PM, Stephen Fuld wrote:
    On 6/20/2025 11:40 AM, John Levine wrote:
    According to quadibloc  <[email protected]>:
    I disagree with his solution, because index registers are for
    displacing from an address; base registers build an address.
    Locations in memory should be able to be addressed in a static
    manner.

    That makes sense now, hot wo much in 1963.

    S/360 architecture had two incompatible goals. It had a 24 bit flat
    address
    space, but all of the actual machines were a lot smaller than that.
    Memory was
    expensive, so they needed compact address encoding. Putting full 24 bit
    addresses in the instructions would have made them too large and
    wasted a lot of
    space. The largest machine in the first set of 360s was only 512K, 19
    address
    bits.

    So they needed a way to put partial addresses in the instructions.
    Considering
    the options, I think they did pretty well. The other possibilities
    would have
    been some kind of bank switching (which they did on the CTSS 7094) or
    local/page0 (PDP-5) or segments, all of which would have made
    programming a lot
    more painful. The scheme they chose meant that you set up a bunch of
    registers
    to point to your code and various chunks of data, and then efficiently
    used them
    as base registers to do your work, since it was rare for a single I/O
    buffer or
    data structure to be bigger than 4K. Arrays might be bigger than that,
    but
    that's what index registers are for.

    I don't think that replacing the multiple base registers with a single
    implicit
    base register as Stephen Fuld has suggested would have been practical.
    Programs
    all used multiple base registers to address data. If you turned those
    into index
    registers, now most of your data accesses will in effect be double
    indexed
    (implicit base plus explicit index) and the extra trip through the
    adder was
    slow on small machines.

    That is a good point.  I need to think about it some more. If the
    "double indexing of RS and SI instructions added more overhead than was
    saved by, for example, the 16 bit displacement in RX instructions, and
    the instructions to keep reloading the base registers, that would be a
    big, probably fatal, drawback.

    Well, I have thought about it more, and came to a revelation. For the
    RS and SI instructions, the displacement is twelve bits, which is 4K.
    The implicit base register points to a page in memory, which is itself
    4K. So to "add" the displacement and the implicit base, you don't
    really need to do an add - simply concatenate the contents of the
    implicit base register with the contents of the displacement. Then a
    single trip through the adder to add the contents of the index register
    is all that is required.


    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Tue Jun 24 19:31:00 2025
    According to Stephen Fuld <[email protected]d>:
    Well, I have thought about it more, and came to a revelation. For the
    RS and SI instructions, the displacement is twelve bits, which is 4K.
    The implicit base register points to a page in memory, which is itself
    4K. So to "add" the displacement and the implicit base, you don't
    really need to do an add - simply concatenate the contents of the
    implicit base register with the contents of the displacement. Then a
    single trip through the adder to add the contents of the index register
    is all that is required.

    Doing it via prefixing is an interesting thought. It doesn't quite work because the memory protection pages were 2K, but I suppose you could
    fudge it somehow, an extra addition if the offset were more than 2K which
    it usually wasn't.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)