Forum: >>> Magnum BBS <<<

Re: New ISA board to play with transputers

From Don Y@21:1/5 to Oscar Toledo G. on Sat Jul 5 23:12:10 2025

On 7/4/2025 3:30 PM, Oscar Toledo G. wrote:

I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay, I started with a prototype wired board on an ISA development card, and then I made a proper PCB in three iterations as I solved some bugs.

The ISA connector was just because I have several old PC motherboards (80286, 80486, a Pentium MMX, and a AMD K5)

The history of development is available at https://nanochess.org/transputer_board.html

The schematics and PCB are available at https://github.com/nanochess/transputer/pcb

In the same git you can get my operating system developed in 1993-1996.

Excellent! What did you learn from the experience (besides the
perils of rushing a PCB)? I.e., what value (or lack thereof) did the transputer offer?

Could you, perhaps, have used a small SBC (arduino, rPi, etc.) and
used GPIOs to twiddle the hardware -- and a USB interface to talk
to it? Or, was the ISA bus an important asset?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gerhard Hoffmann@21:1/5 to All on Sun Jul 6 11:37:54 2025

Am 06.07.25 um 08:12 schrieb Don Y:

On 7/4/2025 3:30 PM, Oscar Toledo G. wrote:

I've developed an ISA board to test some transputer boards (TRAM) I
bought in
eBay, I started with a prototype wired board on an ISA development
card, and then I made a proper PCB in three iterations as I solved
some bugs.

The ISA connector was just because I have several old PC motherboards
(80286,
80486, a Pentium MMX, and a AMD K5)

The history of development is available at
https://nanochess.org/transputer_board.html

The schematics and PCB are available at
https://github.com/nanochess/transputer/pcb

In the same git you can get my operating system developed in 1993-1996.

Excellent! What did you learn from the experience (besides the
perils of rushing a PCB)? I.e., what value (or lack thereof) did the transputer offer?

Could you, perhaps, have used a small SBC (arduino, rPi, etc.) and
used GPIOs to twiddle the hardware -- and a USB interface to talk
to it? Or, was the ISA bus an important asset?

In a previous life I had quite huge a T800 Tranputer cluster and also
did some designs that connected to it.
The ISA bus was not important, but there was a link adaptor
chip (C11? - where is my bottle of Gerontol Forte?) that had a
SRAM-alike "foreign" side that made it easy to handle.

In
< https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/
>
the link chip is between the Western Digital SCSI controller and the
VLSI serial/par IO chip.

Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy,
... Thanks Goddess I had someone to do the board layout in DOS Orcad STD
on a Compaq 286 :-)

Occam was fun. Maybe nowadays it would make a bigger impact with a
substantial number of CPUs on a chip.

Cheers, Gerhard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Sloman@21:1/5 to Oscar Toledo G. on Sun Jul 6 20:49:44 2025

On 5/07/2025 8:30 am, Oscar Toledo G. wrote:

Hi there.

I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
I started with a prototype wired board on an ISA development card, and then I made
a proper PCB in three iterations as I solved some bugs.

The ISA connector was just because I have several old PC motherboards (80286, 80486, a Pentium MMX, and a AMD K5)

The history of development is available at https://nanochess.org/transputer_board.html

The schematics and PCB are available at https://github.com/nanochess/transputer/pcb

In the same git you can get my operating system developed in 1993-1996.

Enjoy it!

This is very much legacy electronics. The transputer was a nice idea
when it was invented, but there are now (and pretty much always were)
different and better ways of solving the problems it addressed.

https://en.wikipedia.org/wiki/Transputer

--
Bill Sloman, Sydney

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Sun Jul 6 04:58:09 2025

In a previous life I had quite huge a T800 Tranputer cluster and also
did some designs that connected to it.
The ISA bus was not important, but there was a link adaptor
chip (C11? - where is my bottle of Gerontol Forte?) that had a
SRAM-alike "foreign" side that made it easy to handle.

In <https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/ >
the link chip is between the Western Digital SCSI controller and the
VLSI serial/par IO chip.

Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... Thanks Goddess I had someone to do the board layout in DOS Orcad STD
on a Compaq 286 :-)

Occam was fun. Maybe nowadays it would make a bigger impact with a substantial number of CPUs on a chip.

But there have been countless (for small values of countless) concurrent
and parallel programming languages (as well as languages with memory
models that can usurp that ability).

People seem largely incapable of decomposing "programs" into concurrent activities *within* a language and, instead, seem to rely on mechanisms
outside the language (e.g., OS-hosted). My take on it is that
fine-grained concurrency is "too much detail" for most developers to
manage (except on special case applications).

[Of course, applications that are inherently SIMD/MIMD can be special-cased. But, the market has a sh*tload of applications that aren't so obviously so
and should be able to benefit from concurrency and parallelism. Designing
an application to fit WELL a multicore processor is a lot harder than it
seems it should be!]

Hence, we let compilers sort out where things can happen "in parallel"
and free ourselves from that minutiae. Looking at parallelism/concurrency
in the model *design* at a higher level of abstraction, instead.

As for the transputer hardware, it seemed to not provide enough, soon enough.

Another idea that was bulldozed away by less sophisticated -- but more
widely available -- solutions.

[E.g., why did the "pure" memory segmentation model fail to evolve beyond
the limited implementations initially offered? Why paged MMUs? etc.]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From john larkin@21:1/5 to All on Sun Jul 6 06:44:20 2025

On Sun, 6 Jul 2025 04:58:09 -0700, Don Y <[email protected]d>
wrote:

In a previous life I had quite huge a T800 Tranputer cluster and also
did some designs that connected to it.
The ISA bus was not important, but there was a link adaptor
chip (C11?� - where is my bottle of Gerontol Forte?) that had a
SRAM-alike "foreign" side that made it easy to handle.

In
<https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/ �>
the link chip is between the Western Digital SCSI controller and the
VLSI serial/par IO chip.

Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... >> Thanks Goddess I had someone to do the board layout in DOS Orcad STD
on a Compaq 286� :-)

Occam was fun. Maybe nowadays it would make a bigger impact with a
substantial number of CPUs on a chip.

But there have been countless (for small values of countless) concurrent
and parallel programming languages (as well as languages with memory
models that can usurp that ability).

People seem largely incapable of decomposing "programs" into concurrent >activities *within* a language and, instead, seem to rely on mechanisms >outside the language (e.g., OS-hosted). My take on it is that
fine-grained concurrency is "too much detail" for most developers to
manage (except on special case applications).

[Of course, applications that are inherently SIMD/MIMD can be special-cased. >But, the market has a sh*tload of applications that aren't so obviously so >and should be able to benefit from concurrency and parallelism. Designing
an application to fit WELL a multicore processor is a lot harder than it >seems it should be!]

Hence, we let compilers sort out where things can happen "in parallel"
and free ourselves from that minutiae. Looking at parallelism/concurrency
in the model *design* at a higher level of abstraction, instead.

As for the transputer hardware, it seemed to not provide enough, soon enough.

Another idea that was bulldozed away by less sophisticated -- but more
widely available -- solutions.

[E.g., why did the "pure" memory segmentation model fail to evolve beyond
the limited implementations initially offered? Why paged MMUs? etc.]

Since CPU cores are trivial nowadays - they cost a few cents each -
the transputer concept may make sense again. We rely on an OS and
compiler tricks to get apparent parallelism, and the price is
complexity and bugs.

Why not have a CPU per task? Each with a decent chunk of dedicated
fast ram?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theo@21:1/5 to john larkin on Sun Jul 6 17:09:25 2025

john larkin <[email protected]> wrote:

Since CPU cores are trivial nowadays - they cost a few cents each -
the transputer concept may make sense again. We rely on an OS and
compiler tricks to get apparent parallelism, and the price is
complexity and bugs.

Why not have a CPU per task? Each with a decent chunk of dedicated
fast ram?

Intel tried that:
https://en.wikipedia.org/wiki/Xeon_Phi

(obviously using x86 was a bad idea, but apart from that...)

The issue is one of memory capacity and bandwidth. Many applications have a large (GB) dataset that doesn't partition nicely up between multiple nodes.

Even the largest FPGAs tend to have MB-scale amounts of memory on them, not
GB, because the memory density of a dedicated DRAM chip is so much better
than making on-chip BRAMs. It turns out to be more efficient to use a large external DRAM and drive it in a highly parallel way, pumping data through a GPU-style core, than it is to have lots of little cores individually
fetching single words from their local BRAM. With that model you also need
a fabric for the little cores to communicate, while with a big DRAM you get inter-core/thread communication for free - you just arrange to a write to a different part of the shared dataset and the next consumer picks it up.

You can of course put GDDR or HBM on an FPGA, but it's the same problem -
only a few devices must be shared by numerous cores. Ultimately memory throughput beats latency hands down, especially for large datasets. This
was not such a problem in the Transputer's day, which is why that
architecture made sense.

Theo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don@21:1/5 to Oscar Toledo G. on Sun Jul 6 15:30:36 2025

Oscar Toledo G. <[email protected]> wrote:

Hi there.

I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
I started with a prototype wired board on an ISA development card, and then I made
a proper PCB in three iterations as I solved some bugs.

The ISA connector was just because I have several old PC motherboards (80286, 80486, a Pentium MMX, and a AMD K5)

The history of development is available at https://nanochess.org/transputer_board.html

The schematics and PCB are available at https://github.com/nanochess/transputer/pcb

In the same git you can get my operating system developed in 1993-1996.

Enjoy it!

Great job!

My agenda includes mastering kicad 9 - including its interface to
ngspice - as soon as possible. My passion these days is exploit Zynq
7000 SoCs to virtualize both lab instruments and radios.
Here's a sneak peek at my retro RF double balanced mixer work in
progess: <https://crcomp.net/mixer/> Although its hardware lash-up
was finished in no time, its webpage and rhetoric still need work.

Danke,

--
Don, KB7RPU, https://www.qsl.net/kb7rpu
There was a young lady named Bright Whose speed was far faster than light;
She set out one day In a relative way And returned on the previous night.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Sloman@21:1/5 to john larkin on Mon Jul 7 01:21:18 2025

On 6/07/2025 11:44 pm, john larkin wrote:

On Sun, 6 Jul 2025 04:58:09 -0700, Don Y <[email protected]d>
wrote:

In a previous life I had quite huge a T800 Tranputer cluster and also
did some designs that connected to it.
The ISA bus was not important, but there was a link adaptor
chip (C11? - where is my bottle of Gerontol Forte?) that had a
SRAM-alike "foreign" side that made it easy to handle.

In
<https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/ >
the link chip is between the Western Digital SCSI controller and the
VLSI serial/par IO chip.

Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... >>> Thanks Goddess I had someone to do the board layout in DOS Orcad STD
on a Compaq 286 :-)

Occam was fun. Maybe nowadays it would make a bigger impact with a
substantial number of CPUs on a chip.

But there have been countless (for small values of countless) concurrent
and parallel programming languages (as well as languages with memory
models that can usurp that ability).

People seem largely incapable of decomposing "programs" into concurrent
activities *within* a language and, instead, seem to rely on mechanisms
outside the language (e.g., OS-hosted). My take on it is that
fine-grained concurrency is "too much detail" for most developers to
manage (except on special case applications).

[Of course, applications that are inherently SIMD/MIMD can be special-cased. >> But, the market has a sh*tload of applications that aren't so obviously so >> and should be able to benefit from concurrency and parallelism. Designing >> an application to fit WELL a multicore processor is a lot harder than it
seems it should be!]

Hence, we let compilers sort out where things can happen "in parallel"
and free ourselves from that minutiae. Looking at parallelism/concurrency >> in the model *design* at a higher level of abstraction, instead.

As for the transputer hardware, it seemed to not provide enough, soon enough.

Another idea that was bulldozed away by less sophisticated -- but more
widely available -- solutions.

[E.g., why did the "pure" memory segmentation model fail to evolve beyond
the limited implementations initially offered? Why paged MMUs? etc.]

Since CPU cores are trivial nowadays - they cost a few cents each -
the transputer concept may make sense again. We rely on an OS and
compiler tricks to get apparent parallelism, and the price is
complexity and bugs.

Why not have a CPU per task? Each with a decent chunk of dedicated
fast ram?

So all tasks are created equal? And dedicating a CPU to every last one
of them isn't an over-kill for most of them?

https://en.wikipedia.org/wiki/Transputer

does offer a slightly more sophisticated insight into why Inmos
eventually went bust - actually it was sold to SGS-Thomson (now STMicroelectronics).

Parallel processing and multitasking are both a complicated subjects,
and one-size-fits-all-solutions don't seem to exist.

People who do special purpose electronic design do tend to have a
grab-bag of techniques developed to solve other problems for other
customers - John Fields could solves lots of problem with a 555, but my
feeling was that a lot of his solutions were sub-optimal.

--
Bill Sloman, Sydney

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tauno Voipio@21:1/5 to Don on Sun Jul 6 19:39:08 2025

On 6.7.2025 18.30, Don wrote:

Oscar Toledo G. <[email protected]> wrote:

Hi there.

I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
I started with a prototype wired board on an ISA development card, and then I made
a proper PCB in three iterations as I solved some bugs.

The ISA connector was just because I have several old PC motherboards (80286,
80486, a Pentium MMX, and a AMD K5)

The history of development is available at
https://nanochess.org/transputer_board.html

The schematics and PCB are available at
https://github.com/nanochess/transputer/pcb

In the same git you can get my operating system developed in 1993-1996.

Enjoy it!

Great job!

My agenda includes mastering kicad 9 - including its interface to
ngspice - as soon as possible. My passion these days is exploit Zynq
7000 SoCs to virtualize both lab instruments and radios.
Here's a sneak peek at my retro RF double balanced mixer work in progess: <https://crcomp.net/mixer/> Although its hardware lash-up
was finished in no time, its webpage and rhetoric still need work.

Danke,

Don,

You do need to use the center tap on TR2 to get anything
but a sophisticated short circuit.

--

-TV

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From john larkin@21:1/5 to [email protected] on Sun Jul 6 10:46:03 2025

On 06 Jul 2025 17:09:25 +0100 (BST), Theo
<[email protected]> wrote:

john larkin <[email protected]> wrote:

Since CPU cores are trivial nowadays - they cost a few cents each -
the transputer concept may make sense again. We rely on an OS and
compiler tricks to get apparent parallelism, and the price is
complexity and bugs.

Why not have a CPU per task? Each with a decent chunk of dedicated
fast ram?

Intel tried that:
https://en.wikipedia.org/wiki/Xeon_Phi

(obviously using x86 was a bad idea, but apart from that...)

The issue is one of memory capacity and bandwidth. Many applications have a >large (GB) dataset that doesn't partition nicely up between multiple nodes.

Even the largest FPGAs tend to have MB-scale amounts of memory on them, not >GB, because the memory density of a dedicated DRAM chip is so much better >than making on-chip BRAMs. It turns out to be more efficient to use a large >external DRAM and drive it in a highly parallel way, pumping data through a >GPU-style core, than it is to have lots of little cores individually
fetching single words from their local BRAM. With that model you also need
a fabric for the little cores to communicate, while with a big DRAM you get >inter-core/thread communication for free - you just arrange to a write to a >different part of the shared dataset and the next consumer picks it up.

You can of course put GDDR or HBM on an FPGA, but it's the same problem - >only a few devices must be shared by numerous cores. Ultimately memory >throughput beats latency hands down, especially for large datasets. This
was not such a problem in the Transputer's day, which is why that >architecture made sense.

Theo

Seems a shame to have an x86 core wasting time handling ethernet and
printers and mice and memory sticks when they could be doing better
things like running Spice.

My Windows 11 thing is running hundreds of processes right now. That's
crazy.

Computing is a mess. A new hardware architecture would at least
suggest a fresh start.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Theo on Sun Jul 6 11:57:41 2025

On 7/6/2025 9:09 AM, Theo wrote:

The issue is one of memory capacity and bandwidth. Many applications have a large (GB) dataset that doesn't partition nicely up between multiple nodes.

You can of course put GDDR or HBM on an FPGA, but it's the same problem - only a few devices must be shared by numerous cores. Ultimately memory throughput beats latency hands down, especially for large datasets. This
was not such a problem in the Transputer's day, which is why that architecture made sense.

Exactly. Partitioning an application "by task" only makes sense if the tasks are orthogonal -- AND can have their own dedicated resources. The memory interface determines performance. In software "communication" drives performance and complexity -- the more things have to share the poorer
the design.

[The exception SIMD]

So, you need to think about the medium used for interconnect fabric.
Shared memory has pitfalls, too -- because it has such a high bandwidth
and protection mechanisms -- in software or hardware -- tend to want to
be light weight (e.g., you wouldn't want a monitor per datum).

Remember, the mantra is to partition to MINIMIZE sharing.

If the "sharers" need a fat pipe between them, then the medium must
directly (or indirectly) support that.

If the "sharers" require a short pipe, then that becomes an issue
(which need not be mutually exclusive with the pipe's width).

So, you have to partition the application to ensure things can
communicate "fast enough" and "soon enough" -- and still address
the folly of "one shared address space".

This can lead to suboptimal hardware implementations. E.g., I have a CPU
per camera instead of a "camera CPU" handling ALL cameras. So, 20 copies
of the same hardware and software all doing essentially the same thing;
but, I can just as easily have *40* copies (whereas a "camera CPU" would eventually be taxed computationally -- will the scale of your
application change, over time? how will that affect your partitioning?).

And, once you have a abundance of computational ability, you then need
to address WHAT gets done WHERE and how that decision will change, over
time. E.g., if the application isn't currently using a camera, then how
can the resources PHYSICALLY set aside for that camera be used to achieve
some other goal? (Ditto any other physical I/Os) Will your rejiggering
of "virtual" resource allocation still "fit" the above communication criteria?

The appeal (from a complexity, reliability, maintainability point of view)
of a well-partitioned system is illusory given practical constraints.

Think of all the MIPS wasted by the CPU in your thermostat that the
CPU in your refrigerator could supply! And, how many are wasted,
there, that could be used by your television/STB? Ah, but SO much
easier to design something that is JUST a thermostat or JUST a
refrigerator... than to design something that can be all of the above!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don@21:1/5 to Tauno Voipio on Sun Jul 6 20:48:06 2025

Tauno Voipio wrote:

Don wrote:

Oscar Toledo G. wrote:

Hi there.

I've developed an ISA board to test some transputer boards (TRAM) I bought in eBay,
I started with a prototype wired board on an ISA development card, and then I made
a proper PCB in three iterations as I solved some bugs.

The ISA connector was just because I have several old PC motherboards (80286,
80486, a Pentium MMX, and a AMD K5)

The history of development is available at
https://nanochess.org/transputer_board.html

The schematics and PCB are available at
https://github.com/nanochess/transputer/pcb

In the same git you can get my operating system developed in 1993-1996.

Enjoy it!

Great job!

My agenda includes mastering kicad 9 - including its interface to
ngspice - as soon as possible. My passion these days is exploit Zynq
7000 SoCs to virtualize both lab instruments and radios.
Here's a sneak peek at my retro RF double balanced mixer work in
progess: <https://crcomp.net/mixer/> Although its hardware lash-up
was finished in no time, its webpage and rhetoric still need work.

Don,

You do need to use the center tap on TR2 to get anything
but a sophisticated short circuit.

The schematic is a work in progress. Its 100 ohm load resistor is
missing along with terminators for the Local Oscillator and Radio
Frequency inputs and Intermediate Frequency output.
Oscilloscope outputs from the ADAML2000 and a Tektronix now
appear at my webpage: <https://crcomp.net/mixer/> . The virtualized
scope does a good job of tracking its analog analog (so to speak).

The LTSpice simulation in AD's tutorial looks great. AD's practical
circuit, as shown in its tutorial, leans a little towards a
"sophisticated short circuit." Ergo, the motivation for my webpage.

Danke,

--
Don, KB7RPU, https://www.qsl.net/kb7rpu
There was a young lady named Bright Whose speed was far faster than light;
She set out one day In a relative way And returned on the previous night.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theo@21:1/5 to john larkin on Mon Jul 7 17:21:11 2025

john larkin <[email protected]> wrote:

Seems a shame to have an x86 core wasting time handling ethernet and
printers and mice and memory sticks when they could be doing better
things like running Spice.

Many of those things are already happening outboard anyway - all those
things have processors in them. What the CPU is doing is largely managing
the data transfer to and from the device. eg the printer speaks PCL or Postscript and the OS's workload is limited to firing the job at the printer (USB/network) and the printer's CPU then decides where to put the ink on
the page.

You can delegate that management oversight to another core if you like, but then you need management oversight of *that* core.

My Windows 11 thing is running hundreds of processes right now. That's
crazy.

Windows problems :-) But many of those things don't need to take much CPU - they're ready to handle print jobs when you press Ctrl-P, but the rest of
the time they're ticking along in the background not taking much resources because they don't need them.

The OS is running thousands of kernel threads, but they're mostly blocked
(not scheduled) until they need to do something. One thread per 'thing',
more or less. All that thread needs is a few hundred bytes for its register state so the impact is small.

Computing is a mess. A new hardware architecture would at least
suggest a fresh start.

Non-Windows, non-x86 architectures are available...

Theo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From john larkin@21:1/5 to [email protected] on Tue Jul 8 03:10:22 2025

On 07 Jul 2025 17:21:11 +0100 (BST), Theo
<[email protected]> wrote:

john larkin <[email protected]> wrote:

Seems a shame to have an x86 core wasting time handling ethernet and
printers and mice and memory sticks when they could be doing better
things like running Spice.

Many of those things are already happening outboard anyway - all those
things have processors in them. What the CPU is doing is largely managing >the data transfer to and from the device. eg the printer speaks PCL or >Postscript and the OS's workload is limited to firing the job at the printer >(USB/network) and the printer's CPU then decides where to put the ink on
the page.

You can delegate that management oversight to another core if you like, but >then you need management oversight of *that* core.

My Windows 11 thing is running hundreds of processes right now. That's
crazy.

Windows problems :-) But many of those things don't need to take much CPU - >they're ready to handle print jobs when you press Ctrl-P, but the rest of
the time they're ticking along in the background not taking much resources >because they don't need them.

The OS is running thousands of kernel threads, but they're mostly blocked >(not scheduled) until they need to do something. One thread per 'thing', >more or less. All that thread needs is a few hundred bytes for its register >state so the impact is small.

Computing is a mess. A new hardware architecture would at least
suggest a fresh start.

Non-Windows, non-x86 architectures are available...

Theo

The x86 is nearly the peak of the silly concept that the CPU is a big
deal. Intel is heavily invested in that idea. ARM and Risc-V cores are
fast and cheap and basically trivial amounts of silicon. We can have a
zillion CPUs on a chip so don't benefit from the brutal complexity and inefficiency of trying to share just a few big ugly CPUs among
hundreds of processes.

We use the RP2040 chip in some products. It's a dual-core 133 MHz ARM
with lots of cute peripherials, including hardware state machines.
It's 75 cents in any quantity. On the new version, the RP2350, they
threw in a couple of RISC-V cores just for fun.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From john larkin@21:1/5 to [email protected] on Tue Jul 8 08:27:11 2025

On Tue, 8 Jul 2025 11:18:49 +0100, John R Walliker
<[email protected]> wrote:

On 08/07/2025 11:10, john larkin wrote:

On 07 Jul 2025 17:21:11 +0100 (BST), Theo
<[email protected]> wrote:

john larkin <[email protected]> wrote:

Seems a shame to have an x86 core wasting time handling ethernet and
printers and mice and memory sticks when they could be doing better
things like running Spice.

Many of those things are already happening outboard anyway - all those
things have processors in them. What the CPU is doing is largely managing >>> the data transfer to and from the device. eg the printer speaks PCL or
Postscript and the OS's workload is limited to firing the job at the printer
(USB/network) and the printer's CPU then decides where to put the ink on >>> the page.

You can delegate that management oversight to another core if you like, but >>> then you need management oversight of *that* core.

My Windows 11 thing is running hundreds of processes right now. That's >>>> crazy.

Windows problems :-) But many of those things don't need to take much CPU -
they're ready to handle print jobs when you press Ctrl-P, but the rest of >>> the time they're ticking along in the background not taking much resources >>> because they don't need them.

The OS is running thousands of kernel threads, but they're mostly blocked >>> (not scheduled) until they need to do something. One thread per 'thing', >>> more or less. All that thread needs is a few hundred bytes for its register
state so the impact is small.

Computing is a mess. A new hardware architecture would at least
suggest a fresh start.

Non-Windows, non-x86 architectures are available...

Theo

The x86 is nearly the peak of the silly concept that the CPU is a big
deal. Intel is heavily invested in that idea. ARM and Risc-V cores are
fast and cheap and basically trivial amounts of silicon. We can have a
zillion CPUs on a chip so don't benefit from the brutal complexity and
inefficiency of trying to share just a few big ugly CPUs among
hundreds of processes.

We use the RP2040 chip in some products. It's a dual-core 133 MHz ARM
with lots of cute peripherials, including hardware state machines.
It's 75 cents in any quantity. On the new version, the RP2350, they
threw in a couple of RISC-V cores just for fun.

Maybe "just for fun" but it might give them a stronger position
when negotiating royalty rates with ARM.
John

Yes, my thoughts too.

I wonder how much they pay to license the ARM. It can't be much, with
two cores on a 75 cent chip.

Intel stock is at about half its peak. Market cap is $101 billion.

Nvidia stock is at an all-time peak and worth $3.9 trillion.

CISC is *so* last millenium.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Tue Jul 8 11:10:05 2025

Many of those things are already happening outboard anyway - all those
things have processors in them.

You don't have to look to "peripherals" to find additional processors.
Many NICs are now implemented as CPUs. Ditto video subsystems (GPUs,
of course). The keyboard *interface* on the original PC was a microcontroller. Ditto sound subsystems, etc.

What the CPU is doing is largely managing
the data transfer to and from the device. eg the printer speaks PCL or Postscript and the OS's workload is limited to firing the job at the printer (USB/network) and the printer's CPU then decides where to put the ink on
the page.

You can delegate that management oversight to another core if you like, but then you need management oversight of *that* core.

Specialized processors (co-processors, if you will) are usually optimized for specific sorts of uses. The CPU in a NIC is probably piss poor at doing floating point math!

Windows problems :-) But many of those things don't need to take much CPU - they're ready to handle print jobs when you press Ctrl-P, but the rest of
the time they're ticking along in the background not taking much resources because they don't need them.

It's "good design". Folks who write "one big program" are living in the 60's. Your goal (for reliability and correctness) is always to compartmentalize and reduce (in complexity) "tasks". Reduce the amount of sharing AND make it
very visible: global variables are the sign of an immature development
style (how do you know ONLY the "right" accesses are happening? why
SHOULD a particular activity NEED to see some datum?)

We have interrupt service routines. Aren't they essentially separate processes?

For more than 40 years, I've used a supervisory process per each ISR;
NOTHING should be talking to the ISR besides that process ("discipline")
This sort of structure makes it easy to impose additional layers
of functionality on those resources.

[E.g., in the days of serial terminals, how could you ensure the current
time of day would be displayed in the lower right corner of the screen
if "anybody" could push characters out the serial port? If, instead,
you require them to go through an agent, then you can have that agent
present a "message oriented" interface and have *it* systematically
access the underlying driver/ISR]

The OS is running thousands of kernel threads, but they're mostly blocked (not scheduled) until they need to do something. One thread per 'thing', more or less. All that thread needs is a few hundred bytes for its register state so the impact is small.

But, threads share an address space. So, can screw with each other.
A cleaner, more robust design is to use separate process containers
for each "thing". So, your network stack can;t screw up your
file system code.

[I think the average Linux release now contains over 1000 bugs! Had
they shifted to FIXING things instead of adding and refactoring,
they could actually improve the quality of the codebase! Ah, but
we can adopt the MS philosophy of just pushing out updates every
week or two! "Mr Smith, could you please bring your vehicle in
for service this week? We'd like to upgrade the brakes to
reduce your stopping distance. And, while we're at it, also
fix something we broke on your previous visit, LAST week!"]

[[I love this: <https://lwn.net/Articles/914632/>
"Also noteworthy is 96c8395e2166 ("spi: Revert modalias changes"),
which deleted six lines of code and has required 24 fixes thereafter."
20 years later and they're still finding bugs in THAT release??]]

[[[Yet another: <https://stack.watch/product/linux/> Note
that vulnerabilities may include *design* flaws and not just
"bugs"]]]

We have delightful hardware available, nowadays. And, affordable.
You no longer have to worry about whether to use a MPY opcode
or have to write a mult() routine. And, can use floating point
almost as easily as integer and fixed point math. You can
use virtual memory, EDAC to improve hardware "reliability",
isolated process containers, etc.

Yet, people still obsess over "make it faster". Um, how about make
it RIGHT, first? Lots of "best practices" have performance hits
(e.g., isolated process containers, lack of globals, capability
based designs, etc.) but dramatically increase the quality of the code.
Yet, folks still opt for fast-and-broken over slowER-and-correct!
Really, how many orders of magnitude FASTER does the hardware
have to get before you stop obsessing over speed? Does the
PRODUCT COST of a system vary THAT much that saving a few pennies on
a slower processor is really going to translate to additional
sales or profits? What cost do you associate with (lack of) quality?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Oscar Toledo G. on Sun Jul 13 18:40:00 2025

On 7/13/2025 2:27 PM, Oscar Toledo G. wrote:

In the same git you can get my operating system developed in 1993-1996.

Excellent! What did you learn from the experience (besides the
perils of rushing a PCB)? I.e., what value (or lack thereof) did the
transputer offer?

The transputer doesn't offer any value actually. I only did it to re-run my old software, and because I had curiosity about running Occam. Installing Occam and trying to get things runnings was not a so amazing experience as I expected.

Even disappointments are learning experiences.

Once the Pentium was out, the transputer was basically obsolete.

Yes. My point that "inferior" solutions can win if they become
ubiquitous. In a sense, the x86 family did as much harm as benefit "computing".

It definitely stifled far more promising (big iron) architectures
that could have made a more meaningful impact on the state of the art!

Could you, perhaps, have used a small SBC (arduino, rPi, etc.) and
used GPIOs to twiddle the hardware -- and a USB interface to talk
to it? Or, was the ISA bus an important asset?

It is my first "complicated" circuit board, so I didn't want to munch more than necessary.

The ISA is an important asset, because there's no point on putting a transputer board on a Pentium machine, but I have a 80286 where the transputer
speed is obvious.

I have several "add in" coprocessors which, for their time, were worth the effort. (Anyone remember weitek FPUs?)

My point, though, was as to whether going with a USB-based interface
might have made your effort easier? I.e., use a COTS board (Arduino, etc.)
to GIVE you the interface to the host PC (so you don't have to develop your
own USB stack) and just make a small adapter board that goes from the (e.g., Arduino) to the transputer.

Or, use a network interface on such a COTS board to talk to the MCU and
let it talk to the trnsputer.

[I've no idea how fat a pipe the transputer needs to do anything meaningful
on the PC]

The advantage of either approach would be your "interface" would be more portable to different machines -- even non-PCs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Oscar Toledo G. on Sun Jul 13 21:34:20 2025

On 7/13/2025 8:46 PM, Oscar Toledo G. wrote:

My point, though, was as to whether going with a USB-based interface
might have made your effort easier? I.e., use a COTS board (Arduino, etc.) >> to GIVE you the interface to the host PC (so you don't have to develop your >> own USB stack) and just make a small adapter board that goes from the (e.g., >> Arduino) to the transputer.

Or, use a network interface on such a COTS board to talk to the MCU and
let it talk to the trnsputer.

[I've no idea how fat a pipe the transputer needs to do anything meaningful >> on the PC]

The advantage of either approach would be your "interface" would be more
portable to different machines -- even non-PCs.

I get now your point. I never thought about it. But I published the schematics
in case someone wants to do another type of interface.

The multiplatform ability currently is my transputer emulator in Javascript. It runs on PC, Mac, or Linux, just visiting my webpage at https://nanochess.org/transputer_emulator.html

Yes. It is amusing how we can run almost cycle-accurate emulations, now,
just due to the advances in process technology.

I think there is already a very good Raspberry Pi project for transputers where you can connect a transputer, or use a Pi to emulate the transputer.

OK. I wouldn't know as Pis aren't of any interest to me.

Maybe it could be a mathematical coprocessor for Arduino, but it would be too expensive.

No, I htink the whole point is what you've addressed -- for people to be able to explore a technology that is no longer (practically) available. E.g.,
run MULTICS even though you can't find a real '645.

Sadly, too much variety and innovation has been lost to boring designs that have taken over markets. To have one's feet firmly planted in the 1970's
(save for process improvements).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Fri Aug 1 11:54:44 2025

In article <104e49f$28h74$[email protected]>,
Bill Sloman <[email protected]> wrote:

On 6/07/2025 11:44 pm, john larkin wrote:

On Sun, 6 Jul 2025 04:58:09 -0700, Don Y <[email protected]d>
wrote:

In a previous life I had quite huge a T800 Tranputer cluster and also
did some designs that connected to it.
The ISA bus was not important, but there was a link adaptor
chip (C11? - where is my bottle of Gerontol Forte?) that had a
SRAM-alike "foreign" side that made it easy to handle.

In

<https://www.flickr.com/photos/137684711@N07/52631074700/in/datetaken/lightbox/

;

the link chip is between the Western Digital SCSI controller and the
VLSI serial/par IO chip.

Complete industrial PC/AT with Multibus2, lots of DRAM, disks, floppy, ... >>>> Thanks Goddess I had someone to do the board layout in DOS Orcad STD
on a Compaq 286 :-)

Occam was fun. Maybe nowadays it would make a bigger impact with a
substantial number of CPUs on a chip.

But there have been countless (for small values of countless) concurrent >>> and parallel programming languages (as well as languages with memory
models that can usurp that ability).

People seem largely incapable of decomposing "programs" into concurrent
activities *within* a language and, instead, seem to rely on mechanisms
outside the language (e.g., OS-hosted). My take on it is that
fine-grained concurrency is "too much detail" for most developers to
manage (except on special case applications).

[Of course, applications that are inherently SIMD/MIMD can be special-cased.
But, the market has a sh*tload of applications that aren't so obviously so >>> and should be able to benefit from concurrency and parallelism. Designing >>> an application to fit WELL a multicore processor is a lot harder than it >>> seems it should be!]

Hence, we let compilers sort out where things can happen "in parallel"
and free ourselves from that minutiae. Looking at parallelism/concurrency >>> in the model *design* at a higher level of abstraction, instead.

As for the transputer hardware, it seemed to not provide enough, soon enough.

Another idea that was bulldozed away by less sophisticated -- but more
widely available -- solutions.

[E.g., why did the "pure" memory segmentation model fail to evolve beyond >>> the limited implementations initially offered? Why paged MMUs? etc.]

Since CPU cores are trivial nowadays - they cost a few cents each -
the transputer concept may make sense again. We rely on an OS and
compiler tricks to get apparent parallelism, and the price is
complexity and bugs.

Why not have a CPU per task? Each with a decent chunk of dedicated
fast ram?

So all tasks are created equal? And dedicating a CPU to every last one
of them isn't an over-kill for most of them?

This doesn't take into account the architecture of the CPU.
There is no need to dedicate a CPU to task, because multiple
task can be mapped within one CPU.

https://en.wikipedia.org/wiki/Transputer

does offer a slightly more sophisticated insight into why Inmos
eventually went bust - actually it was sold to SGS-Thomson (now >STMicroelectronics).

The nodes in a parallel system, shouldn't be low level but at
a par with the most cost effective CPU in the era.
There was no investment in a 4 Ghz T8000 (64 bit) with 1 Gbyte links
and I bet that it could have at least have a niche for itself.

I was involved in a geology simulation system with interactive
graphics for the Shell (1990). The alternative was the Cray,
and the difference in costs were staggering.

I made a twin counting program as a demonstration. We (HCC)
had a heterogeneous bunch of in total 60 transputers.
I borrowed from military and educational institutes for a total
cluster of 180 transputers. They worked well together, a hotspotch
of power supplies and transputer boxes. Imagine 180 386 boxes
working together.

https://home.hccnet.nl/a.w.m.van.der.horst/transputer.html

The Fortg compiler available through this page, should work
as long as a transputer link is operational.

Parallel processing and multitasking are both a complicated subjects,
and one-size-fits-all-solutions don't seem to exist.

People who do special purpose electronic design do tend to have a
grab-bag of techniques developed to solve other problems for other
customers - John Fields could solves lots of problem with a 555, but my >feeling was that a lot of his solutions were sub-optimal.

I'm firmly convinced that the transputer route is not fully
explored. HuaWei beats nvidia with ai, not with superior cpu's
(although that is coming), but with superior inter cpu communication.

--
Bill Sloman, Sydney

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to [email protected] on Fri Aug 1 07:56:08 2025

On 8/1/2025 2:54 AM, [email protected] wrote:

Why not have a CPU per task? Each with a decent chunk of dedicated
fast ram?

So all tasks are created equal? And dedicating a CPU to every last one
of them isn't an over-kill for most of them?

This doesn't take into account the architecture of the CPU.
There is no need to dedicate a CPU to task, because multiple
task can be mapped within one CPU.

Exactly. You either overprovision all of the nodes so the most
challenging task can meet its performance goals OR leave many?
nodes with less resources than they actually need.

Easier (and tried-and-true) to just use the excess capacity at each
(and every!) node to handle some other task -- or part thereof.

You don't get true parallelism but, most often, concurrency is
sufficient for the /application/.

https://en.wikipedia.org/wiki/Transputer

does offer a slightly more sophisticated insight into why Inmos
eventually went bust - actually it was sold to SGS-Thomson (now
STMicroelectronics).

The nodes in a parallel system, shouldn't be low level but at
a par with the most cost effective CPU in the era.

... subject to the constraint that they can meet the needs of
the application.

I recall a design with 1024 8051-class processors. Lots of potential
MIPS but the communication costs were on a par with the computation.

There was no investment in a 4 Ghz T8000 (64 bit) with 1 Gbyte links
and I bet that it could have at least have a niche for itself.

I think general purpose wins, again -- just because of economies
of scale. When you have millions of units of type X being sold,
it's just so much cheaper to throw MANY of them at a problem,
knowing that next year you can replace them with similar units
that run twice as fast.

I have ~300 dual core, 1.4GHz A53's in my current design. By the
time I commit to a firm hardware specification, I suspect that
will be quad core and/or 2GHz. For constant dollars. This has
a powerfully liberating impact on how you approach the design;
you can afford to build more abstract layers into it instead
of trying to skimp in the name of (imagined) performance. You
can adopt more conservative techniques to increase the reliability
and availability of the design (e.g., I *never* reboot, let
alone when installing new software!)

I was involved in a geology simulation system with interactive
graphics for the Shell (1990). The alternative was the Cray,
and the difference in costs were staggering.

But how many such instances can you point to? How large of a
market can you support before someone just brute forces a
solution *around* you? It's hard to design when the performance
*baseline* keeps creeping up faster than you can finish a design!

I made a twin counting program as a demonstration. We (HCC)
had a heterogeneous bunch of in total 60 transputers.
I borrowed from military and educational institutes for a total
cluster of 180 transputers. They worked well together, a hotspotch
of power supplies and transputer boxes. Imagine 180 386 boxes
working together.

The problem is always the "working together". People don't tend to think
well "in parallel" so often don't come up with well-partitioned designs.
This is especially true when you consider the cost of communication
between physical processors when you have a mindset of just building
a stack frame and passing control to <something>.

https://home.hccnet.nl/a.w.m.van.der.horst/transputer.html

The Fortg compiler available through this page, should work
as long as a transputer link is operational.

Parallel processing and multitasking are both a complicated subjects,
and one-size-fits-all-solutions don't seem to exist.

People who do special purpose electronic design do tend to have a
grab-bag of techniques developed to solve other problems for other
customers - John Fields could solves lots of problem with a 555, but my
feeling was that a lot of his solutions were sub-optimal.

I'm firmly convinced that the transputer route is not fully
explored. HuaWei beats nvidia with ai, not with superior cpu's
(although that is coming), but with superior inter cpu communication.

But the models they are implementing are a special case that doesn't
translate to the rest of the application domain, well. Will you now
expect an "AI coprocessor" in every AI-enabled device? Or, will
you expect designers to implement AI features alongside more
conventional algorithms in more ubiquitous devices?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Jul 30 09:03:28 2026
  from Wales, Uk via Telnet
- Bob Worm
  Thu Jul 30 08:47:34 2026
  from Wales, Uk via Telnet
- Bob Worm
  Thu Jul 30 08:36:06 2026
  from Wales, Uk via Telnet
- Rixter
  Thu Jul 30 02:32:09 2026
  from Madison, Nc via Telnet
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet
- Zenobyte
  Wed Jul 29 21:08:05 2026
  from San Juan, Pr via Telnet
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	85:56:12
Calls:	12,454
Calls today:	4
Files:	15,195
Messages:	6,537,805

Re: New ISA board to play with transputers

Who's Online

Recent Visitors

System Info