• Re: New ISA board to play with transputers

    From Don Y@21:1/5 to Oscar Toledo G. on Sat Jul 5 23:12:10 2025
    On 7/4/2025 3:30 PM, Oscar Toledo G. wrote:
    I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay, I started with a prototype wired board on an ISA development card, and then I made a proper PCB in three iterations as I solved some bugs.

    The ISA connector was just because I have several old PC motherboards (80286, 80486, a Pentium MMX, and a AMD K5)

    The history of development is available at https://nanochess.org/transputer_board.html

    The schematics and PCB are available at https://github.com/nanochess/transputer/pcb

    In the same git you can get my operating system developed in 1993-1996.

    Excellent! What did you learn from the experience (besides the
    perils of rushing a PCB)? I.e., what value (or lack thereof) did the transputer offer?

    Could you, perhaps, have used a small SBC (arduino, rPi, etc.) and
    used GPIOs to twiddle the hardware -- and a USB interface to talk
    to it? Or, was the ISA bus an important asset?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerhard Hoffmann@21:1/5 to All on Sun Jul 6 11:37:54 2025
    Am 06.07.25 um 08:12 schrieb Don Y:
    On 7/4/2025 3:30 PM, Oscar Toledo G. wrote:
    I've developed an ISA board to test some transputer boards (TRAM) I
    bought in
    eBay, I started with a prototype wired board on an ISA development
    card, and then I made a proper PCB in three iterations as I solved
    some bugs.

    The ISA connector was just because I have several old PC motherboards
    (80286,
    80486, a Pentium MMX, and a AMD K5)

    The history of development is available at
    https://nanochess.org/transputer_board.html

    The schematics and PCB are available at
    https://github.com/nanochess/transputer/pcb

    In the same git you can get my operating system developed in 1993-1996.

    Excellent!  What did you learn from the experience (besides the
    perils of rushing a PCB)?  I.e., what value (or lack thereof) did the transputer offer?

    Could you, perhaps, have used a small SBC (arduino, rPi, etc.) and
    used GPIOs to twiddle the hardware -- and a USB interface to talk
    to it?  Or, was the ISA bus an important asset?

    In a previous life I had quite huge a T800 Tranputer cluster and also
    did some designs that connected to it.
    The ISA bus was not important, but there was a link adaptor
    chip (C11? - where is my bottle of Gerontol Forte?) that had a
    SRAM-alike "foreign" side that made it easy to handle.

    In
    < https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/
    >
    the link chip is between the Western Digital SCSI controller and the
    VLSI serial/par IO chip.

    Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy,
    ... Thanks Goddess I had someone to do the board layout in DOS Orcad STD
    on a Compaq 286 :-)

    Occam was fun. Maybe nowadays it would make a bigger impact with a
    substantial number of CPUs on a chip.

    Cheers, Gerhard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Sloman@21:1/5 to Oscar Toledo G. on Sun Jul 6 20:49:44 2025
    On 5/07/2025 8:30 am, Oscar Toledo G. wrote:
    Hi there.

    I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
    I started with a prototype wired board on an ISA development card, and then I made
    a proper PCB in three iterations as I solved some bugs.

    The ISA connector was just because I have several old PC motherboards (80286, 80486, a Pentium MMX, and a AMD K5)

    The history of development is available at https://nanochess.org/transputer_board.html

    The schematics and PCB are available at https://github.com/nanochess/transputer/pcb

    In the same git you can get my operating system developed in 1993-1996.

    Enjoy it!

    This is very much legacy electronics. The transputer was a nice idea
    when it was invented, but there are now (and pretty much always were)
    different and better ways of solving the problems it addressed.

    https://en.wikipedia.org/wiki/Transputer

    --
    Bill Sloman, Sydney

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Jul 6 04:58:09 2025
    In a previous life I had quite huge a T800 Tranputer cluster and also
    did some designs that connected to it.
    The ISA bus was not important, but there was a link adaptor
    chip (C11?  - where is my bottle of Gerontol Forte?) that had a
    SRAM-alike "foreign" side that made it easy to handle.

    In <https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/  >
    the link chip is between the Western Digital SCSI controller and the
    VLSI serial/par IO chip.

    Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... Thanks Goddess I had someone to do the board layout in DOS Orcad STD
    on a Compaq 286  :-)

    Occam was fun. Maybe nowadays it would make a bigger impact with a substantial number of CPUs on a chip.

    But there have been countless (for small values of countless) concurrent
    and parallel programming languages (as well as languages with memory
    models that can usurp that ability).

    People seem largely incapable of decomposing "programs" into concurrent activities *within* a language and, instead, seem to rely on mechanisms
    outside the language (e.g., OS-hosted). My take on it is that
    fine-grained concurrency is "too much detail" for most developers to
    manage (except on special case applications).

    [Of course, applications that are inherently SIMD/MIMD can be special-cased. But, the market has a sh*tload of applications that aren't so obviously so
    and should be able to benefit from concurrency and parallelism. Designing
    an application to fit WELL a multicore processor is a lot harder than it
    seems it should be!]

    Hence, we let compilers sort out where things can happen "in parallel"
    and free ourselves from that minutiae. Looking at parallelism/concurrency
    in the model *design* at a higher level of abstraction, instead.

    As for the transputer hardware, it seemed to not provide enough, soon enough.

    Another idea that was bulldozed away by less sophisticated -- but more
    widely available -- solutions.

    [E.g., why did the "pure" memory segmentation model fail to evolve beyond
    the limited implementations initially offered? Why paged MMUs? etc.]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john larkin@21:1/5 to All on Sun Jul 6 06:44:20 2025
    On Sun, 6 Jul 2025 04:58:09 -0700, Don Y <[email protected]d>
    wrote:

    In a previous life I had quite huge a T800 Tranputer cluster and also
    did some designs that connected to it.
    The ISA bus was not important, but there was a link adaptor
    chip (C11?� - where is my bottle of Gerontol Forte?) that had a
    SRAM-alike "foreign" side that made it easy to handle.

    In
    <https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/ �>
    the link chip is between the Western Digital SCSI controller and the
    VLSI serial/par IO chip.

    Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... >> Thanks Goddess I had someone to do the board layout in DOS Orcad STD
    on a Compaq 286� :-)

    Occam was fun. Maybe nowadays it would make a bigger impact with a
    substantial number of CPUs on a chip.

    But there have been countless (for small values of countless) concurrent
    and parallel programming languages (as well as languages with memory
    models that can usurp that ability).

    People seem largely incapable of decomposing "programs" into concurrent >activities *within* a language and, instead, seem to rely on mechanisms >outside the language (e.g., OS-hosted). My take on it is that
    fine-grained concurrency is "too much detail" for most developers to
    manage (except on special case applications).

    [Of course, applications that are inherently SIMD/MIMD can be special-cased. >But, the market has a sh*tload of applications that aren't so obviously so >and should be able to benefit from concurrency and parallelism. Designing
    an application to fit WELL a multicore processor is a lot harder than it >seems it should be!]

    Hence, we let compilers sort out where things can happen "in parallel"
    and free ourselves from that minutiae. Looking at parallelism/concurrency
    in the model *design* at a higher level of abstraction, instead.

    As for the transputer hardware, it seemed to not provide enough, soon enough.

    Another idea that was bulldozed away by less sophisticated -- but more
    widely available -- solutions.

    [E.g., why did the "pure" memory segmentation model fail to evolve beyond
    the limited implementations initially offered? Why paged MMUs? etc.]

    Since CPU cores are trivial nowadays - they cost a few cents each -
    the transputer concept may make sense again. We rely on an OS and
    compiler tricks to get apparent parallelism, and the price is
    complexity and bugs.

    Why not have a CPU per task? Each with a decent chunk of dedicated
    fast ram?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to john larkin on Sun Jul 6 17:09:25 2025
    john larkin <[email protected]> wrote:
    Since CPU cores are trivial nowadays - they cost a few cents each -
    the transputer concept may make sense again. We rely on an OS and
    compiler tricks to get apparent parallelism, and the price is
    complexity and bugs.

    Why not have a CPU per task? Each with a decent chunk of dedicated
    fast ram?

    Intel tried that:
    https://en.wikipedia.org/wiki/Xeon_Phi

    (obviously using x86 was a bad idea, but apart from that...)

    The issue is one of memory capacity and bandwidth. Many applications have a large (GB) dataset that doesn't partition nicely up between multiple nodes.

    Even the largest FPGAs tend to have MB-scale amounts of memory on them, not
    GB, because the memory density of a dedicated DRAM chip is so much better
    than making on-chip BRAMs. It turns out to be more efficient to use a large external DRAM and drive it in a highly parallel way, pumping data through a GPU-style core, than it is to have lots of little cores individually
    fetching single words from their local BRAM. With that model you also need
    a fabric for the little cores to communicate, while with a big DRAM you get inter-core/thread communication for free - you just arrange to a write to a different part of the shared dataset and the next consumer picks it up.

    You can of course put GDDR or HBM on an FPGA, but it's the same problem -
    only a few devices must be shared by numerous cores. Ultimately memory throughput beats latency hands down, especially for large datasets. This
    was not such a problem in the Transputer's day, which is why that
    architecture made sense.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don@21:1/5 to Oscar Toledo G. on Sun Jul 6 15:30:36 2025
    Oscar Toledo G. <[email protected]> wrote:
    Hi there.

    I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
    I started with a prototype wired board on an ISA development card, and then I made
    a proper PCB in three iterations as I solved some bugs.

    The ISA connector was just because I have several old PC motherboards (80286, 80486, a Pentium MMX, and a AMD K5)

    The history of development is available at https://nanochess.org/transputer_board.html

    The schematics and PCB are available at https://github.com/nanochess/transputer/pcb

    In the same git you can get my operating system developed in 1993-1996.

    Enjoy it!

    Great job!

    My agenda includes mastering kicad 9 - including its interface to
    ngspice - as soon as possible. My passion these days is exploit Zynq
    7000 SoCs to virtualize both lab instruments and radios.
    Here's a sneak peek at my retro RF double balanced mixer work in
    progess: <https://crcomp.net/mixer/> Although its hardware lash-up
    was finished in no time, its webpage and rhetoric still need work.

    Danke,

    --
    Don, KB7RPU, https://www.qsl.net/kb7rpu
    There was a young lady named Bright Whose speed was far faster than light;
    She set out one day In a relative way And returned on the previous night.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Sloman@21:1/5 to john larkin on Mon Jul 7 01:21:18 2025
    On 6/07/2025 11:44 pm, john larkin wrote:
    On Sun, 6 Jul 2025 04:58:09 -0700, Don Y <[email protected]d>
    wrote:

    In a previous life I had quite huge a T800 Tranputer cluster and also
    did some designs that connected to it.
    The ISA bus was not important, but there was a link adaptor
    chip (C11?  - where is my bottle of Gerontol Forte?) that had a
    SRAM-alike "foreign" side that made it easy to handle.

    In
    <https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/  >
    the link chip is between the Western Digital SCSI controller and the
    VLSI serial/par IO chip.

    Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... >>> Thanks Goddess I had someone to do the board layout in DOS Orcad STD
    on a Compaq 286  :-)

    Occam was fun. Maybe nowadays it would make a bigger impact with a
    substantial number of CPUs on a chip.

    But there have been countless (for small values of countless) concurrent
    and parallel programming languages (as well as languages with memory
    models that can usurp that ability).

    People seem largely incapable of decomposing "programs" into concurrent
    activities *within* a language and, instead, seem to rely on mechanisms
    outside the language (e.g., OS-hosted). My take on it is that
    fine-grained concurrency is "too much detail" for most developers to
    manage (except on special case applications).

    [Of course, applications that are inherently SIMD/MIMD can be special-cased. >> But, the market has a sh*tload of applications that aren't so obviously so >> and should be able to benefit from concurrency and parallelism. Designing >> an application to fit WELL a multicore processor is a lot harder than it
    seems it should be!]

    Hence, we let compilers sort out where things can happen "in parallel"
    and free ourselves from that minutiae. Looking at parallelism/concurrency >> in the model *design* at a higher level of abstraction, instead.

    As for the transputer hardware, it seemed to not provide enough, soon enough.

    Another idea that was bulldozed away by less sophisticated -- but more
    widely available -- solutions.

    [E.g., why did the "pure" memory segmentation model fail to evolve beyond
    the limited implementations initially offered? Why paged MMUs? etc.]

    Since CPU cores are trivial nowadays - they cost a few cents each -
    the transputer concept may make sense again. We rely on an OS and
    compiler tricks to get apparent parallelism, and the price is
    complexity and bugs.

    Why not have a CPU per task? Each with a decent chunk of dedicated
    fast ram?

    So all tasks are created equal? And dedicating a CPU to every last one
    of them isn't an over-kill for most of them?

    https://en.wikipedia.org/wiki/Transputer

    does offer a slightly more sophisticated insight into why Inmos
    eventually went bust - actually it was sold to SGS-Thomson (now STMicroelectronics).

    Parallel processing and multitasking are both a complicated subjects,
    and one-size-fits-all-solutions don't seem to exist.

    People who do special purpose electronic design do tend to have a
    grab-bag of techniques developed to solve other problems for other
    customers - John Fields could solves lots of problem with a 555, but my
    feeling was that a lot of his solutions were sub-optimal.

    --
    Bill Sloman, Sydney

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tauno Voipio@21:1/5 to Don on Sun Jul 6 19:39:08 2025
    On 6.7.2025 18.30, Don wrote:
    Oscar Toledo G. <[email protected]> wrote:
    Hi there.

    I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
    I started with a prototype wired board on an ISA development card, and then I made
    a proper PCB in three iterations as I solved some bugs.

    The ISA connector was just because I have several old PC motherboards (80286,
    80486, a Pentium MMX, and a AMD K5)

    The history of development is available at
    https://nanochess.org/transputer_board.html

    The schematics and PCB are available at
    https://github.com/nanochess/transputer/pcb

    In the same git you can get my operating system developed in 1993-1996.

    Enjoy it!

    Great job!

    My agenda includes mastering kicad 9 - including its interface to
    ngspice - as soon as possible. My passion these days is exploit Zynq
    7000 SoCs to virtualize both lab instruments and radios.
    Here's a sneak peek at my retro RF double balanced mixer work in progess: <https://crcomp.net/mixer/> Although its hardware lash-up
    was finished in no time, its webpage and rhetoric still need work.

    Danke,


    Don,

    You do need to use the center tap on TR2 to get anything
    but a sophisticated short circuit.

    --

    -TV

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john larkin@21:1/5 to [email protected] on Sun Jul 6 10:46:03 2025
    On 06 Jul 2025 17:09:25 +0100 (BST), Theo
    <[email protected]> wrote:

    john larkin <[email protected]> wrote:
    Since CPU cores are trivial nowadays - they cost a few cents each -
    the transputer concept may make sense again. We rely on an OS and
    compiler tricks to get apparent parallelism, and the price is
    complexity and bugs.

    Why not have a CPU per task? Each with a decent chunk of dedicated
    fast ram?

    Intel tried that:
    https://en.wikipedia.org/wiki/Xeon_Phi

    (obviously using x86 was a bad idea, but apart from that...)

    The issue is one of memory capacity and bandwidth. Many applications have a >large (GB) dataset that doesn't partition nicely up between multiple nodes.

    Even the largest FPGAs tend to have MB-scale amounts of memory on them, not >GB, because the memory density of a dedicated DRAM chip is so much better >than making on-chip BRAMs. It turns out to be more efficient to use a large >external DRAM and drive it in a highly parallel way, pumping data through a >GPU-style core, than it is to have lots of little cores individually
    fetching single words from their local BRAM. With that model you also need
    a fabric for the little cores to communicate, while with a big DRAM you get >inter-core/thread communication for free - you just arrange to a write to a >different part of the shared dataset and the next consumer picks it up.

    You can of course put GDDR or HBM on an FPGA, but it's the same problem - >only a few devices must be shared by numerous cores. Ultimately memory >throughput beats latency hands down, especially for large datasets. This
    was not such a problem in the Transputer's day, which is why that >architecture made sense.

    Theo

    Seems a shame to have an x86 core wasting time handling ethernet and
    printers and mice and memory sticks when they could be doing better
    things like running Spice.

    My Windows 11 thing is running hundreds of processes right now. That's
    crazy.

    Computing is a mess. A new hardware architecture would at least
    suggest a fresh start.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Theo on Sun Jul 6 11:57:41 2025
    On 7/6/2025 9:09 AM, Theo wrote:
    The issue is one of memory capacity and bandwidth. Many applications have a large (GB) dataset that doesn't partition nicely up between multiple nodes.

    You can of course put GDDR or HBM on an FPGA, but it's the same problem - only a few devices must be shared by numerous cores. Ultimately memory throughput beats latency hands down, especially for large datasets. This
    was not such a problem in the Transputer's day, which is why that architecture made sense.

    Exactly. Partitioning an application "by task" only makes sense if the tasks are orthogonal -- AND can have their own dedicated resources. The memory interface determines performance. In software "communication" drives performance and complexity -- the more things have to share the poorer
    the design.

    [The exception SIMD]

    So, you need to think about the medium used for interconnect fabric.
    Shared memory has pitfalls, too -- because it has such a high bandwidth
    and protection mechanisms -- in software or hardware -- tend to want to
    be light weight (e.g., you wouldn't want a monitor per datum).

    Remember, the mantra is to partition to MINIMIZE sharing.

    If the "sharers" need a fat pipe between them, then the medium must
    directly (or indirectly) support that.

    If the "sharers" require a short pipe, then that becomes an issue
    (which need not be mutually exclusive with the pipe's width).

    So, you have to partition the application to ensure things can
    communicate "fast enough" and "soon enough" -- and still address
    the folly of "one shared address space".

    This can lead to suboptimal hardware implementations. E.g., I have a CPU
    per camera instead of a "camera CPU" handling ALL cameras. So, 20 copies
    of the same hardware and software all doing essentially the same thing;
    but, I can just as easily have *40* copies (whereas a "camera CPU" would eventually be taxed computationally -- will the scale of your
    application change, over time? how will that affect your partitioning?).

    And, once you have a abundance of computational ability, you then need
    to address WHAT gets done WHERE and how that decision will change, over
    time. E.g., if the application isn't currently using a camera, then how
    can the resources PHYSICALLY set aside for that camera be used to achieve
    some other goal? (Ditto any other physical I/Os) Will your rejiggering
    of "virtual" resource allocation still "fit" the above communication criteria?

    The appeal (from a complexity, reliability, maintainability point of view)
    of a well-partitioned system is illusory given practical constraints.

    Think of all the MIPS wasted by the CPU in your thermostat that the
    CPU in your refrigerator could supply! And, how many are wasted,
    there, that could be used by your television/STB? Ah, but SO much
    easier to design something that is JUST a thermostat or JUST a
    refrigerator... than to design something that can be all of the above!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don@21:1/5 to Tauno Voipio on Sun Jul 6 20:48:06 2025
    Tauno Voipio wrote:
    Don wrote:
    Oscar Toledo G. wrote:
    Hi there.

    I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
    I started with a prototype wired board on an ISA development card, and then I made
    a proper PCB in three iterations as I solved some bugs.

    The ISA connector was just because I have several old PC motherboards (80286,
    80486, a Pentium MMX, and a AMD K5)

    The history of development is available at
    https://nanochess.org/transputer_board.html

    The schematics and PCB are available at
    https://github.com/nanochess/transputer/pcb

    In the same git you can get my operating system developed in 1993-1996.

    Enjoy it!

    Great job!

    My agenda includes mastering kicad 9 - including its interface to
    ngspice - as soon as possible. My passion these days is exploit Zynq
    7000 SoCs to virtualize both lab instruments and radios.
    Here's a sneak peek at my retro RF double balanced mixer work in
    progess: <https://crcomp.net/mixer/> Although its hardware lash-up
    was finished in no time, its webpage and rhetoric still need work.

    Don,

    You do need to use the center tap on TR2 to get anything
    but a sophisticated short circuit.

    The schematic is a work in progress. Its 100 ohm load resistor is
    missing along with terminators for the Local Oscillator and Radio
    Frequency inputs and Intermediate Frequency output.
    Oscilloscope outputs from the ADAML2000 and a Tektronix now
    appear at my webpage: <https://crcomp.net/mixer/> . The virtualized
    scope does a good job of tracking its analog analog (so to speak).

    The LTSpice simulation in AD's tutorial looks great. AD's practical
    circuit, as shown in its tutorial, leans a little towards a
    "sophisticated short circuit." Ergo, the motivation for my webpage.

    Danke,

    --
    Don, KB7RPU, https://www.qsl.net/kb7rpu
    There was a young lady named Bright Whose speed was far faster than light;
    She set out one day In a relative way And returned on the previous night.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to john larkin on Mon Jul 7 17:21:11 2025
    john larkin <[email protected]> wrote:

    Seems a shame to have an x86 core wasting time handling ethernet and
    printers and mice and memory sticks when they could be doing better
    things like running Spice.

    Many of those things are already happening outboard anyway - all those
    things have processors in them. What the CPU is doing is largely managing
    the data transfer to and from the device. eg the printer speaks PCL or Postscript and the OS's workload is limited to firing the job at the printer (USB/network) and the printer's CPU then decides where to put the ink on
    the page.

    You can delegate that management oversight to another core if you like, but then you need management oversight of *that* core.

    My Windows 11 thing is running hundreds of processes right now. That's
    crazy.

    Windows problems :-) But many of those things don't need to take much CPU - they're ready to handle print jobs when you press Ctrl-P, but the rest of
    the time they're ticking along in the background not taking much resources because they don't need them.

    The OS is running thousands of kernel threads, but they're mostly blocked
    (not scheduled) until they need to do something. One thread per 'thing',
    more or less. All that thread needs is a few hundred bytes for its register state so the impact is small.

    Computing is a mess. A new hardware architecture would at least
    suggest a fresh start.

    Non-Windows, non-x86 architectures are available...

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john larkin@21:1/5 to [email protected] on Tue Jul 8 03:10:22 2025
    On 07 Jul 2025 17:21:11 +0100 (BST), Theo
    <[email protected]> wrote:

    john larkin <[email protected]> wrote:

    Seems a shame to have an x86 core wasting time handling ethernet and
    printers and mice and memory sticks when they could be doing better
    things like running Spice.

    Many of those things are already happening outboard anyway - all those
    things have processors in them. What the CPU is doing is largely managing >the data transfer to and from the device. eg the printer speaks PCL or >Postscript and the OS's workload is limited to firing the job at the printer >(USB/network) and the printer's CPU then decides where to put the ink on
    the page.

    You can delegate that management oversight to another core if you like, but >then you need management oversight of *that* core.

    My Windows 11 thing is running hundreds of processes right now. That's
    crazy.

    Windows problems :-) But many of those things don't need to take much CPU - >they're ready to handle print jobs when you press Ctrl-P, but the rest of
    the time they're ticking along in the background not taking much resources >because they don't need them.

    The OS is running thousands of kernel threads, but they're mostly blocked >(not scheduled) until they need to do something. One thread per 'thing', >more or less. All that thread needs is a few hundred bytes for its register >state so the impact is small.

    Computing is a mess. A new hardware architecture would at least
    suggest a fresh start.

    Non-Windows, non-x86 architectures are available...

    Theo

    The x86 is nearly the peak of the silly concept that the CPU is a big
    deal. Intel is heavily invested in that idea. ARM and Risc-V cores are
    fast and cheap and basically trivial amounts of silicon. We can have a
    zillion CPUs on a chip so don't benefit from the brutal complexity and inefficiency of trying to share just a few big ugly CPUs among
    hundreds of processes.

    We use the RP2040 chip in some products. It's a dual-core 133 MHz ARM
    with lots of cute peripherials, including hardware state machines.
    It's 75 cents in any quantity. On the new version, the RP2350, they
    threw in a couple of RISC-V cores just for fun.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john larkin@21:1/5 to [email protected] on Tue Jul 8 08:27:11 2025
    On Tue, 8 Jul 2025 11:18:49 +0100, John R Walliker
    <[email protected]> wrote:

    On 08/07/2025 11:10, john larkin wrote:
    On 07 Jul 2025 17:21:11 +0100 (BST), Theo
    <[email protected]> wrote:

    john larkin <[email protected]> wrote:

    Seems a shame to have an x86 core wasting time handling ethernet and
    printers and mice and memory sticks when they could be doing better
    things like running Spice.

    Many of those things are already happening outboard anyway - all those
    things have processors in them. What the CPU is doing is largely managing >>> the data transfer to and from the device. eg the printer speaks PCL or
    Postscript and the OS's workload is limited to firing the job at the printer
    (USB/network) and the printer's CPU then decides where to put the ink on >>> the page.

    You can delegate that management oversight to another core if you like, but >>> then you need management oversight of *that* core.

    My Windows 11 thing is running hundreds of processes right now. That's >>>> crazy.

    Windows problems :-) But many of those things don't need to take much CPU -
    they're ready to handle print jobs when you press Ctrl-P, but the rest of >>> the time they're ticking along in the background not taking much resources >>> because they don't need them.

    The OS is running thousands of kernel threads, but they're mostly blocked >>> (not scheduled) until they need to do something. One thread per 'thing', >>> more or less. All that thread needs is a few hundred bytes for its register
    state so the impact is small.

    Computing is a mess. A new hardware architecture would at least
    suggest a fresh start.

    Non-Windows, non-x86 architectures are available...

    Theo

    The x86 is nearly the peak of the silly concept that the CPU is a big
    deal. Intel is heavily invested in that idea. ARM and Risc-V cores are
    fast and cheap and basically trivial amounts of silicon. We can have a
    zillion CPUs on a chip so don't benefit from the brutal complexity and
    inefficiency of trying to share just a few big ugly CPUs among
    hundreds of processes.

    We use the RP2040 chip in some products. It's a dual-core 133 MHz ARM
    with lots of cute peripherials, including hardware state machines.
    It's 75 cents in any quantity. On the new version, the RP2350, they
    threw in a couple of RISC-V cores just for fun.

    Maybe "just for fun" but it might give them a stronger position
    when negotiating royalty rates with ARM.
    John


    Yes, my thoughts too.

    I wonder how much they pay to license the ARM. It can't be much, with
    two cores on a 75 cent chip.

    Intel stock is at about half its peak. Market cap is $101 billion.

    Nvidia stock is at an all-time peak and worth $3.9 trillion.

    CISC is *so* last millenium.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Tue Jul 8 11:10:05 2025
    Many of those things are already happening outboard anyway - all those
    things have processors in them.

    You don't have to look to "peripherals" to find additional processors.
    Many NICs are now implemented as CPUs. Ditto video subsystems (GPUs,
    of course). The keyboard *interface* on the original PC was a microcontroller. Ditto sound subsystems, etc.

    What the CPU is doing is largely managing
    the data transfer to and from the device. eg the printer speaks PCL or Postscript and the OS's workload is limited to firing the job at the printer (USB/network) and the printer's CPU then decides where to put the ink on
    the page.

    You can delegate that management oversight to another core if you like, but then you need management oversight of *that* core.

    Specialized processors (co-processors, if you will) are usually optimized for specific sorts of uses. The CPU in a NIC is probably piss poor at doing floating point math!

    Windows problems :-) But many of those things don't need to take much CPU - they're ready to handle print jobs when you press Ctrl-P, but the rest of
    the time they're ticking along in the background not taking much resources because they don't need them.

    It's "good design". Folks who write "one big program" are living in the 60's. Your goal (for reliability and correctness) is always to compartmentalize and reduce (in complexity) "tasks". Reduce the amount of sharing AND make it
    very visible: global variables are the sign of an immature development
    style (how do you know ONLY the "right" accesses are happening? why
    SHOULD a particular activity NEED to see some datum?)

    We have interrupt service routines. Aren't they essentially separate processes?

    For more than 40 years, I've used a supervisory process per each ISR;
    NOTHING should be talking to the ISR besides that process ("discipline")
    This sort of structure makes it easy to impose additional layers
    of functionality on those resources.

    [E.g., in the days of serial terminals, how could you ensure the current
    time of day would be displayed in the lower right corner of the screen
    if "anybody" could push characters out the serial port? If, instead,
    you require them to go through an agent, then you can have that agent
    present a "message oriented" interface and have *it* systematically
    access the underlying driver/ISR]

    The OS is running thousands of kernel threads, but they're mostly blocked (not scheduled) until they need to do something. One thread per 'thing', more or less. All that thread needs is a few hundred bytes for its register state so the impact is small.

    But, threads share an address space. So, can screw with each other.
    A cleaner, more robust design is to use separate process containers
    for each "thing". So, your network stack can;t screw up your
    file system code.

    [I think the average Linux release now contains over 1000 bugs! Had
    they shifted to FIXING things instead of adding and refactoring,
    they could actually improve the quality of the codebase! Ah, but
    we can adopt the MS philosophy of just pushing out updates every
    week or two! "Mr Smith, could you please bring your vehicle in
    for service this week? We'd like to upgrade the brakes to
    reduce your stopping distance. And, while we're at it, also
    fix something we broke on your previous visit, LAST week!"]

    [[I love this: <https://lwn.net/Articles/914632/>
    "Also noteworthy is 96c8395e2166 ("spi: Revert modalias changes"),
    which deleted six lines of code and has required 24 fixes thereafter."
    20 years later and they're still finding bugs in THAT release??]]

    [[[Yet another: <https://stack.watch/product/linux/> Note
    that vulnerabilities may include *design* flaws and not just
    "bugs"]]]

    We have delightful hardware available, nowadays. And, affordable.
    You no longer have to worry about whether to use a MPY opcode
    or have to write a mult() routine. And, can use floating point
    almost as easily as integer and fixed point math. You can
    use virtual memory, EDAC to improve hardware "reliability",
    isolated process containers, etc.

    Yet, people still obsess over "make it faster". Um, how about make
    it RIGHT, first? Lots of "best practices" have performance hits
    (e.g., isolated process containers, lack of globals, capability
    based designs, etc.) but dramatically increase the quality of the code.
    Yet, folks still opt for fast-and-broken over slowER-and-correct!
    Really, how many orders of magnitude FASTER does the hardware
    have to get before you stop obsessing over speed? Does the
    PRODUCT COST of a system vary THAT much that saving a few pennies on
    a slower processor is really going to translate to additional
    sales or profits? What cost do you associate with (lack of) quality?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Oscar Toledo G. on Sun Jul 13 18:40:00 2025
    On 7/13/2025 2:27 PM, Oscar Toledo G. wrote:

    In the same git you can get my operating system developed in 1993-1996.

    Excellent! What did you learn from the experience (besides the
    perils of rushing a PCB)? I.e., what value (or lack thereof) did the
    transputer offer?

    The transputer doesn't offer any value actually. I only did it to re-run my old software, and because I had curiosity about running Occam. Installing Occam and trying to get things runnings was not a so amazing experience as I expected.

    Even disappointments are learning experiences.

    Once the Pentium was out, the transputer was basically obsolete.

    Yes. My point that "inferior" solutions can win if they become
    ubiquitous. In a sense, the x86 family did as much harm as benefit "computing".

    It definitely stifled far more promising (big iron) architectures
    that could have made a more meaningful impact on the state of the art!

    Could you, perhaps, have used a small SBC (arduino, rPi, etc.) and
    used GPIOs to twiddle the hardware -- and a USB interface to talk
    to it? Or, was the ISA bus an important asset?

    It is my first "complicated" circuit board, so I didn't want to munch more than necessary.

    The ISA is an important asset, because there's no point on putting a transputer board on a Pentium machine, but I have a 80286 where the transputer
    speed is obvious.

    I have several "add in" coprocessors which, for their time, were worth the effort. (Anyone remember weitek FPUs?)

    My point, though, was as to whether going with a USB-based interface
    might have made your effort easier? I.e., use a COTS board (Arduino, etc.)
    to GIVE you the interface to the host PC (so you don't have to develop your
    own USB stack) and just make a small adapter board that goes from the (e.g., Arduino) to the transputer.

    Or, use a network interface on such a COTS board to talk to the MCU and
    let it talk to the trnsputer.

    [I've no idea how fat a pipe the transputer needs to do anything meaningful
    on the PC]

    The advantage of either approach would be your "interface" would be more portable to different machines -- even non-PCs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Oscar Toledo G. on Sun Jul 13 21:34:20 2025
    On 7/13/2025 8:46 PM, Oscar Toledo G. wrote:
    My point, though, was as to whether going with a USB-based interface
    might have made your effort easier? I.e., use a COTS board (Arduino, etc.) >> to GIVE you the interface to the host PC (so you don't have to develop your >> own USB stack) and just make a small adapter board that goes from the (e.g., >> Arduino) to the transputer.

    Or, use a network interface on such a COTS board to talk to the MCU and
    let it talk to the trnsputer.

    [I've no idea how fat a pipe the transputer needs to do anything meaningful >> on the PC]

    The advantage of either approach would be your "interface" would be more
    portable to different machines -- even non-PCs.

    I get now your point. I never thought about it. But I published the schematics
    in case someone wants to do another type of interface.

    The multiplatform ability currently is my transputer emulator in Javascript. It runs on PC, Mac, or Linux, just visiting my webpage at https://nanochess.org/transputer_emulator.html

    Yes. It is amusing how we can run almost cycle-accurate emulations, now,
    just due to the advances in process technology.

    I think there is already a very good Raspberry Pi project for transputers where you can connect a transputer, or use a Pi to emulate the transputer.

    OK. I wouldn't know as Pis aren't of any interest to me.

    Maybe it could be a mathematical coprocessor for Arduino, but it would be too expensive.

    No, I htink the whole point is what you've addressed -- for people to be able to explore a technology that is no longer (practically) available. E.g.,
    run MULTICS even though you can't find a real '645.

    Sadly, too much variety and innovation has been lost to boring designs that have taken over markets. To have one's feet firmly planted in the 1970's
    (save for process improvements).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to [email protected] on Fri Aug 1 11:54:44 2025
    In article <104e49f$28h74$[email protected]>,
    Bill Sloman <[email protected]> wrote:
    On 6/07/2025 11:44 pm, john larkin wrote:
    On Sun, 6 Jul 2025 04:58:09 -0700, Don Y <[email protected]d>
    wrote:

    In a previous life I had quite huge a T800 Tranputer cluster and also
    did some designs that connected to it.
    The ISA bus was not important, but there was a link adaptor
    chip (C11?  - where is my bottle of Gerontol Forte?) that had a
    SRAM-alike "foreign" side that made it easy to handle.

    In
    <https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/
    ;
    the link chip is between the Western Digital SCSI controller and the
    VLSI serial/par IO chip.

    Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... >>>> Thanks Goddess I had someone to do the board layout in DOS Orcad STD
    on a Compaq 286  :-)

    Occam was fun. Maybe nowadays it would make a bigger impact with a
    substantial number of CPUs on a chip.

    But there have been countless (for small values of countless) concurrent >>> and parallel programming languages (as well as languages with memory
    models that can usurp that ability).

    People seem largely incapable of decomposing "programs" into concurrent
    activities *within* a language and, instead, seem to rely on mechanisms
    outside the language (e.g., OS-hosted). My take on it is that
    fine-grained concurrency is "too much detail" for most developers to
    manage (except on special case applications).

    [Of course, applications that are inherently SIMD/MIMD can be special-cased.
    But, the market has a sh*tload of applications that aren't so obviously so >>> and should be able to benefit from concurrency and parallelism. Designing >>> an application to fit WELL a multicore processor is a lot harder than it >>> seems it should be!]

    Hence, we let compilers sort out where things can happen "in parallel"
    and free ourselves from that minutiae. Looking at parallelism/concurrency >>> in the model *design* at a higher level of abstraction, instead.

    As for the transputer hardware, it seemed to not provide enough, soon enough.

    Another idea that was bulldozed away by less sophisticated -- but more
    widely available -- solutions.

    [E.g., why did the "pure" memory segmentation model fail to evolve beyond >>> the limited implementations initially offered? Why paged MMUs? etc.]

    Since CPU cores are trivial nowadays - they cost a few cents each -
    the transputer concept may make sense again. We rely on an OS and
    compiler tricks to get apparent parallelism, and the price is
    complexity and bugs.

    Why not have a CPU per task? Each with a decent chunk of dedicated
    fast ram?

    So all tasks are created equal? And dedicating a CPU to every last one
    of them isn't an over-kill for most of them?

    This doesn't take into account the architecture of the CPU.
    There is no need to dedicate a CPU to task, because multiple
    task can be mapped within one CPU.


    https://en.wikipedia.org/wiki/Transputer

    does offer a slightly more sophisticated insight into why Inmos
    eventually went bust - actually it was sold to SGS-Thomson (now >STMicroelectronics).

    The nodes in a parallel system, shouldn't be low level but at
    a par with the most cost effective CPU in the era.
    There was no investment in a 4 Ghz T8000 (64 bit) with 1 Gbyte links
    and I bet that it could have at least have a niche for itself.

    I was involved in a geology simulation system with interactive
    graphics for the Shell (1990). The alternative was the Cray,
    and the difference in costs were staggering.

    I made a twin counting program as a demonstration. We (HCC)
    had a heterogeneous bunch of in total 60 transputers.
    I borrowed from military and educational institutes for a total
    cluster of 180 transputers. They worked well together, a hotspotch
    of power supplies and transputer boxes. Imagine 180 386 boxes
    working together.

    https://home.hccnet.nl/a.w.m.van.der.horst/transputer.html

    The Fortg compiler available through this page, should work
    as long as a transputer link is operational.

    Parallel processing and multitasking are both a complicated subjects,
    and one-size-fits-all-solutions don't seem to exist.

    People who do special purpose electronic design do tend to have a
    grab-bag of techniques developed to solve other problems for other
    customers - John Fields could solves lots of problem with a 555, but my >feeling was that a lot of his solutions were sub-optimal.

    I'm firmly convinced that the transputer route is not fully
    explored. HuaWei beats nvidia with ai, not with superior cpu's
    (although that is coming), but with superior inter cpu communication.


    --
    Bill Sloman, Sydney

    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to [email protected] on Fri Aug 1 07:56:08 2025
    On 8/1/2025 2:54 AM, [email protected] wrote:
    Why not have a CPU per task? Each with a decent chunk of dedicated
    fast ram?

    So all tasks are created equal? And dedicating a CPU to every last one
    of them isn't an over-kill for most of them?

    This doesn't take into account the architecture of the CPU.
    There is no need to dedicate a CPU to task, because multiple
    task can be mapped within one CPU.

    Exactly. You either overprovision all of the nodes so the most
    challenging task can meet its performance goals OR leave many?
    nodes with less resources than they actually need.

    Easier (and tried-and-true) to just use the excess capacity at each
    (and every!) node to handle some other task -- or part thereof.

    You don't get true parallelism but, most often, concurrency is
    sufficient for the /application/.

    https://en.wikipedia.org/wiki/Transputer

    does offer a slightly more sophisticated insight into why Inmos
    eventually went bust - actually it was sold to SGS-Thomson (now
    STMicroelectronics).

    The nodes in a parallel system, shouldn't be low level but at
    a par with the most cost effective CPU in the era.

    ... subject to the constraint that they can meet the needs of
    the application.

    I recall a design with 1024 8051-class processors. Lots of potential
    MIPS but the communication costs were on a par with the computation.

    There was no investment in a 4 Ghz T8000 (64 bit) with 1 Gbyte links
    and I bet that it could have at least have a niche for itself.

    I think general purpose wins, again -- just because of economies
    of scale. When you have millions of units of type X being sold,
    it's just so much cheaper to throw MANY of them at a problem,
    knowing that next year you can replace them with similar units
    that run twice as fast.

    I have ~300 dual core, 1.4GHz A53's in my current design. By the
    time I commit to a firm hardware specification, I suspect that
    will be quad core and/or 2GHz. For constant dollars. This has
    a powerfully liberating impact on how you approach the design;
    you can afford to build more abstract layers into it instead
    of trying to skimp in the name of (imagined) performance. You
    can adopt more conservative techniques to increase the reliability
    and availability of the design (e.g., I *never* reboot, let
    alone when installing new software!)

    I was involved in a geology simulation system with interactive
    graphics for the Shell (1990). The alternative was the Cray,
    and the difference in costs were staggering.

    But how many such instances can you point to? How large of a
    market can you support before someone just brute forces a
    solution *around* you? It's hard to design when the performance
    *baseline* keeps creeping up faster than you can finish a design!

    I made a twin counting program as a demonstration. We (HCC)
    had a heterogeneous bunch of in total 60 transputers.
    I borrowed from military and educational institutes for a total
    cluster of 180 transputers. They worked well together, a hotspotch
    of power supplies and transputer boxes. Imagine 180 386 boxes
    working together.

    The problem is always the "working together". People don't tend to think
    well "in parallel" so often don't come up with well-partitioned designs.
    This is especially true when you consider the cost of communication
    between physical processors when you have a mindset of just building
    a stack frame and passing control to <something>.

    https://home.hccnet.nl/a.w.m.van.der.horst/transputer.html

    The Fortg compiler available through this page, should work
    as long as a transputer link is operational.

    Parallel processing and multitasking are both a complicated subjects,
    and one-size-fits-all-solutions don't seem to exist.

    People who do special purpose electronic design do tend to have a
    grab-bag of techniques developed to solve other problems for other
    customers - John Fields could solves lots of problem with a 555, but my
    feeling was that a lot of his solutions were sub-optimal.

    I'm firmly convinced that the transputer route is not fully
    explored. HuaWei beats nvidia with ai, not with superior cpu's
    (although that is coming), but with superior inter cpu communication.

    But the models they are implementing are a special case that doesn't
    translate to the rest of the application domain, well. Will you now
    expect an "AI coprocessor" in every AI-enabled device? Or, will
    you expect designers to implement AI features alongside more
    conventional algorithms in more ubiquitous devices?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)