Forum: >>> Magnum BBS <<<

Is Parallel Programming Hard, And, If So, What Can You Do About It?

From Thomas Koenig@21:1/5 to All on Sat May 10 11:38:46 2025

For those who don't know it: This it the title of a book on,
you guessed it, parallel programming (the "perfbook"), from the
perspective of a Linux developer, Paul E. McKenney.

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

Much of it should be familiar to many contributors to comp.arch,
but certainly not everything will be familiar to everyone (if I
take myself as an example). It also contains a little appendix
entitled "Advice to Hardware Designers", which is interesting.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to All on Sun May 11 00:04:58 2025

Summary:: Devices need just as much cache coherence as cores--maybe
more.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to [email protected] on Sun May 11 13:59:56 2025

[email protected] (MitchAlsup1) writes:

Summary:: Devices need just as much cache coherence as cores--maybe
more.

Does a uart need cache coherence? How about a SPI or MMC controller?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Al Kossow@21:1/5 to Scott Lurndal on Sun May 11 07:47:53 2025

On 5/11/25 6:59 AM, Scott Lurndal wrote:

[email protected] (MitchAlsup1) writes:

Summary:: Devices need just as much cache coherence as cores--maybe
more.

Does a uart need cache coherence? How about a SPI or MMC controller?

I had wondered about SOCs like the RPi Pico
With the narrow memory interfaces,
are cores starved for memory bandwidth?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Al Kossow on Sun May 11 23:46:17 2025

On Sun, 11 May 2025 07:47:53 -0700, Al Kossow wrote:

With the narrow memory interfaces, are cores starved for memory
bandwidth?

Low memory bandwidth has been a chronic issue since the “wait states” of the 1980s.

That’s why we have memory caches.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Mon May 12 00:30:37 2025

On Sun, 11 May 2025 23:46:17 +0000, Lawrence D'Oliveiro wrote:

On Sun, 11 May 2025 07:47:53 -0700, Al Kossow wrote:

With the narrow memory interfaces, are cores starved for memory
bandwidth?

Low memory bandwidth has been a chronic issue since the “wait states” of the 1980s.

That’s why we have memory caches.

Which architects tend to only understand what happens when these
caches are attached to CPUs and not "Joe Random Bus Master",

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Mon May 12 02:03:44 2025

On Mon, 12 May 2025 1:19:44 +0000, Lawrence D'Oliveiro wrote:

On Mon, 12 May 2025 00:30:37 +0000, MitchAlsup1 wrote:

On Sun, 11 May 2025 23:46:17 +0000, Lawrence D'Oliveiro wrote:

That’s why we have memory caches.

Which architects tend to only understand what happens when these caches
are attached to CPUs and not "Joe Random Bus Master",

One of my pet peeves is disk drives with memory caches in them. Why?

Makes the disk faster, decreasing pressure on the "Disk Cache" in
DRAM.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Mon May 12 01:19:44 2025

On Mon, 12 May 2025 00:30:37 +0000, MitchAlsup1 wrote:

On Sun, 11 May 2025 23:46:17 +0000, Lawrence D'Oliveiro wrote:

That’s why we have memory caches.

Which architects tend to only understand what happens when these caches
are attached to CPUs and not "Joe Random Bus Master",

One of my pet peeves is disk drives with memory caches in them. Why?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Mon May 12 08:05:56 2025

Lawrence D'Oliveiro wrote:

On Mon, 12 May 2025 00:30:37 +0000, MitchAlsup1 wrote:

On Sun, 11 May 2025 23:46:17 +0000, Lawrence D'Oliveiro wrote:

Thatâ€™s why we have memory caches.

Which architects tend to only understand what happens when these caches
are attached to CPUs and not "Joe Random Bus Master",

One of my pet peeves is disk drives with memory caches in them. Why?

For reads it allows the disk to always read full sets of sectors, the following blocks are likely to be needed soon anyway.

For writes, as long as the drive has enough energy (maybe in the form of spinning inertia, or a hefty cap?) the always be able to save the buffer
cache to spinning rust, it can allow operations to complete immediately,
or as soon as the data has been transferred into the disk cache.

Since all disks are using linear sector (or 4K block?) addressing these
days, instead of head/cylinder/sector, a little bit of cache can help
hide the tiny time glitches when the disk has to reposition.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Terje Mathisen on Mon May 12 08:41:57 2025

Terje Mathisen <[email protected]> writes:

Lawrence D'Oliveiro wrote:

One of my pet peeves is disk drives with memory caches in them. Why?
=20

For reads it allows the disk to always read full sets of sectors, the=20 >following blocks are likely to be needed soon anyway.

Yes.

For writes, as long as the drive has enough energy (maybe in the form of =

spinning inertia, or a hefty cap?) the always be able to save the buffer =

cache to spinning rust, it can allow operations to complete immediately, =

or as soon as the data has been transferred into the disk cache.

I used to think so, but someone (who appeared knowledgable) corrected
that view and told me that all that HDDs ever do on power loss is to
complete the current sector.

Since all disks are using linear sector (or 4K block?) addressing these=20 >days, instead of head/cylinder/sector, a little bit of cache can help=20
hide the tiny time glitches when the disk has to reposition.

An alternative is to skew the start of the first sector on the next
track to take the repositioning time into account, and from what I
have read, that is what is done. It allows to complete a sequence of
sectors in one go, and then move on to the next sequence of sectors
elsewhere, without having to wait for another disk rotation to be able
to write the sector that was missed in an unskewed drive.

On SSDs DRAM cache is also used for storing the logical-to-physical
sector mapping of the flash translation layer; accessing it on flash
is apparently too slow. Some DRAM-less SSDs keep that cache in host
DRAM (but AFAIK in an area that is then not accessed by the host).

In all of these caches the disk drive cache is not addressable by the
CPU and it's coherence with the CPU cache is therefore not an issue.
It's coherence with the permanent state of the drive is an issue,
though. SCSI and SATA drives (and, I guess, NVME drives, too, but I
have read little about that) support command-queuing interfaces that
allow the file systems to manage this coherence. File systems are not
as good at that as I would like.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Mon May 12 22:34:35 2025

On Mon, 12 May 2025 02:03:44 +0000, MitchAlsup1 wrote:

On Mon, 12 May 2025 1:19:44 +0000, Lawrence D'Oliveiro wrote:

One of my pet peeves is disk drives with memory caches in them. Why?

Makes the disk faster, decreasing pressure on the "Disk Cache" in DRAM.

You realize that cache is on the wrong side of a drive interface which is
not designed to run at RAM speeds?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Terje Mathisen on Mon May 12 22:35:57 2025

On Mon, 12 May 2025 08:05:56 +0200, Terje Mathisen wrote:

Lawrence D'Oliveiro wrote:

One of my pet peeves is disk drives with memory caches in them. Why?

For reads it allows the disk to always read full sets of sectors, the following blocks are likely to be needed soon anyway.

Leave that up to the OS I/O optimization algorithms. Because they know
things about the data that the drive doesn’t.

For writes, as long as the drive has enough energy (maybe in the form of spinning inertia, or a hefty cap?) the always be able to save the buffer cache to spinning rust, it can allow operations to complete immediately,
or as soon as the data has been transferred into the disk cache.

In other words, telling lies to the OS that the write has completed when
it hasn’t. This kind of thing can really stuff up filesystem integrity guarantees.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon May 12 22:39:02 2025

On Mon, 12 May 2025 08:41:57 GMT, Anton Ertl wrote:

On SSDs DRAM cache is also used for storing the logical-to-physical
sector mapping of the flash translation layer; accessing it on flash is apparently too slow.

There is a lot of complicated firmware in SSDs to make them look as much
like a traditional hard drive as possible, so that traditional hard drive filesystems can be used unchanged. This firmware has been known to have
bugs in it.

Whereas the Linux kernel includes a few filesystems purpose-designed for operation on raw flash devices, that integrate wear-levelling etc right
into the block allocation algorithms. Wouldn’t it be much better (more efficient and more reliable) to get rid of most of that firmware layer,
and use these sorts of filesystems directly?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Anton Ertl on Mon May 12 17:14:19 2025

On 5/12/2025 1:41 AM, Anton Ertl wrote:

Terje Mathisen <[email protected]> writes:

Lawrence D'Oliveiro wrote:

One of my pet peeves is disk drives with memory caches in them. Why?
=20

For reads it allows the disk to always read full sets of sectors, the=20
following blocks are likely to be needed soon anyway.

Yes.

For writes, as long as the drive has enough energy (maybe in the form of = >>
spinning inertia, or a hefty cap?) the always be able to save the buffer = >>
cache to spinning rust, it can allow operations to complete immediately, = >>
or as soon as the data has been transferred into the disk cache.

I used to think so, but someone (who appeared knowledgable) corrected
that view and told me that all that HDDs ever do on power loss is to
complete the current sector.

Caveat - my knowledge on this subject is old and may be obsolete. But it
was correct.

Close. They complete the current sector (so that a subsequent read
doesn't cause a messy error due to partially written sector), then do an emergency retract of the heads. This is done so when the disk stops
spinning and the heads land, they do so in the "landing zone" which is
an area that is a little rougher than the rest of the disk. This
prevents the heads from becoming stuck to the disk surface, a problem
known as "stiction" without this, when power is restored and the disk
starts spinning again, the heads would stick to the disk surface and be
ripped off the arm, essentially destroying the drive. The rougher
landing zone area of the disk prevents stiction.

If you want the write operation to complete after a power outage, you
need a battery backup.

Since all disks are using linear sector (or 4K block?) addressing these=20 >> days, instead of head/cylinder/sector, a little bit of cache can help=20
hide the tiny time glitches when the disk has to reposition.

An alternative is to skew the start of the first sector on the next
track to take the repositioning time into account, and from what I
have read, that is what is done. It allows to complete a sequence of
sectors in one go, and then move on to the next sequence of sectors elsewhere, without having to wait for another disk rotation to be able
to write the sector that was missed in an unskewed drive.

Actually, both a cache and skewing are done. There are two "levels" of
skew. A small skew is used at the end of each track except the last on
on the cylinder to allow for a small "microseek" to allow the drive to compensate for the small differences in track location in each surface.
A larger skew is used after the last track on each cylinder to allow the
heads to seek to the next cylinder without, as you say, incurring a full rotation delay.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Mon May 12 21:50:02 2025

Lawrence D'Oliveiro [2025-05-12 22:35:57] wrote:

On Mon, 12 May 2025 08:05:56 +0200, Terje Mathisen wrote:

For reads it allows the disk to always read full sets of sectors, the
following blocks are likely to be needed soon anyway.

Leave that up to the OS I/O optimization algorithms. Because they know things about the data that the drive doesn’t.

But the drive also knows things about the data that the OS can't know
(things that have to do with the physical location of the data on the platters). Which is why it makes sense for both the OS and the drive to
make their own efforts.

Lawrence D'Oliveiro [2025-05-12 22:39:02] wrote:

On Mon, 12 May 2025 08:41:57 GMT, Anton Ertl wrote:

On SSDs DRAM cache is also used for storing the logical-to-physical
sector mapping of the flash translation layer; accessing it on flash is
apparently too slow.

There is a lot of complicated firmware in SSDs to make them look as
much like a traditional hard drive as possible, so that traditional
hard drive filesystems can be used unchanged. This firmware has been
known to have bugs in it.

Bugs is largely attached to "complicated", yes. This said, I've been
lucky enough not to bump into any of them in my years of use of SSDs.
I admittedly don't push them very hard.

Whereas the Linux kernel includes a few filesystems purpose-designed
for operation on raw flash devices, that integrate wear-levelling etc
right into the block allocation algorithms. Wouldn’t it be much
better (more efficient and more reliable) to get rid of most of that
firmware layer, and use these sorts of filesystems directly?

More reliable, I don't know: to get comparable performance, you'll need comparable complexity, so probably comparable amount of bugs.
Tho I guess by being exposed to many more eyes (by virtue of being Free Software), it could have a chance of being more reliable, maybe.

But in any case, your above argument has some problems:

- Those "few filesystems" aren't nearly good enough to compete with
a normal filesystem running on top of a typical SSD. Simply because
those filesystems have not been designed for those kinds of uses.
Last I checked, they don't scale very well to TB sizes, for example.
And they haven't seen nearly as much work put into avoiding stuttering
and poor performance when the drive is full. More generally, they
haven't received nearly as much attention as has been invested in
SSDs' "FTL".

- The experience with flash technology in the Linux kernel for smaller
devices like home routers and such suggests that doing wear leveling
in the filesystem is a bad idea because you want to do it over the
whole device: no big difference if you have a single filesystem on the
whole drive, but for the general case you want something like UBI,
i.e. a kind of volume-management system that takes care of spreading
the writes over the whole drive as well as remapping defective pages,
while still exposing some of the semantics of flash chips, so you need
non-standard filesystems on top of that

- For better of for worse, drive manufacturers simply have not given
access to the "raw" flash layer. I'm not completely sure why, but
I get the impression that manufacturers use it as a way to segment the
market, with different prices for the same flash chips combined with
different FTLs.
But maybe at some point, market conditions will change and we'll be
able to buy SSDs that can be accessed directly at the flash level?

I agree with you in theory, but in practice I think the potential gain is rather small. Maybe the "block device abstraction" isn't such a bad
choice in the end.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stefan Monnier on Tue May 13 03:03:46 2025

On Mon, 12 May 2025 21:50:02 -0400, Stefan Monnier wrote:

But the drive also knows things about the data that the OS can't know
(things that have to do with the physical location of the data on the platters).

No it doesn’t. Consider a journalling filesystem, where the journal entry must be written first before the actual filesystem update. If the drive reorders the writes to do the latter first, there go your filesystem
integrity guarantees.

More reliable, I don't know: to get comparable performance, you'll need comparable complexity ...

Fewer layers ⇒ less complexity.

- Those "few filesystems" aren't nearly good enough to compete with
a normal filesystem running on top of a typical SSD. Simply because
those filesystems have not been designed for those kinds of uses.

Of course they are. You don’t think there are filesystem experts working
on the Linux kernel?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Mon May 12 23:25:59 2025

But the drive also knows things about the data that the OS can't know
(things that have to do with the physical location of the data on the
platters).

No it doesn’t.

I admire your nuanced understanding of the matter.

More reliable, I don't know: to get comparable performance, you'll
need comparable complexity ...

Fewer layers ⇒ less complexity.

The relation between these two is not nearly that simple.

- Those "few filesystems" aren't nearly good enough to compete with
a normal filesystem running on top of a typical SSD. Simply because
those filesystems have not been designed for those kinds of uses.

Of course they are.

Can you name some?

You don’t think there are filesystem experts working on the
Linux kernel?

Just because your design is optimized for ~100MB devices doesn't mean
you're an idiot who couldn't have made a design optimized for
~1TB devices.

Just like the fact that your design for ~1TB devices sucks rocks
when used on ~100MB devices doesn't make you an idiot.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Tue May 13 07:40:35 2025

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 12 May 2025 08:05:56 +0200, Terje Mathisen wrote:

Lawrence D'Oliveiro wrote:

One of my pet peeves is disk drives with memory caches in them. Why?

For reads it allows the disk to always read full sets of sectors, the
following blocks are likely to be needed soon anyway.

Leave that up to the OS I/O optimization algorithms. Because they know
things about the data that the drive doesn’t.

In this case the drive knows things that the OS does not: Consider
that the OS asked for sector N what happens when the arm finally has
settled enough to read from the track, and the first sector it sees is
sector N+10. Now the drive could employ a work-to-rule attitude and
ignore all sectors until it finds sector N, and then send that over to
the OS. The OS takes a little time and asks for N+1. Unfortunately,
your work-to-rule HDD has already rotated a little way into sector
N+1, so it has to wait another rotation for N+1, and so on. By
contrast, a read-caching HDD will read the whole track in one
rotation, and then deliver sector N and all the other sectors from
that track that the OS asks for out of the cache.

For writes, as long as the drive has enough energy (maybe in the form of
spinning inertia, or a hefty cap?) the always be able to save the buffer
cache to spinning rust, it can allow operations to complete immediately,
or as soon as the data has been transferred into the disk cache.

In other words, telling lies to the OS that the write has completed when
it hasn’t.

Not necessarily. When you do command queuing and report the writes as completed only when they actually have completed, you also need RAM on
the drive for storing all the queued writes and their data.

This kind of thing can really stuff up filesystem integrity
guarantees.

Which file system offers any useful integrity guarantees? When I last
looked in the Linux docs, the only file system giving any integrity
guarantees at all was NILFS2.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Tue May 13 08:12:58 2025

On Tue, 13 May 2025 07:40:35 GMT, Anton Ertl wrote:

In this case the drive knows things that the OS does not: Consider that
the OS asked for sector N what happens when the arm finally has settled enough to read from the track, and the first sector it sees is sector
N+10.

The OS isn’t likely to ask for one sector at a time. It will use time-
tested algorithms like scatter-read/gather-write and elevator seeking.

This is why we have filesystem caches.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Tue May 13 08:39:35 2025

On 5/12/2025 8:03 PM, Lawrence D'Oliveiro wrote:

On Mon, 12 May 2025 21:50:02 -0400, Stefan Monnier wrote:

But the drive also knows things about the data that the OS can't know
(things that have to do with the physical location of the data on the
platters).

No it doesn’t. Consider a journalling filesystem, where the journal entry must be written first before the actual filesystem update. If the drive reorders the writes to do the latter first, there go your filesystem integrity guarantees.

First of all, you said that there aren't things that the drive knows but
the host doesn't. This isn't true (e.g. which disk sectors are
defective and had to be relocated), and the rest of your response
doesn't refute that, but talks about file system stuff. Secondly, if a particular write, in your example, the journal write has to be completed
before the other write, the disk provides that capability. It has
nothing to do with what the disk knows.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Tue May 13 08:33:15 2025

On 5/13/2025 1:12 AM, Lawrence D'Oliveiro wrote:

On Tue, 13 May 2025 07:40:35 GMT, Anton Ertl wrote:

In this case the drive knows things that the OS does not: Consider that
the OS asked for sector N what happens when the arm finally has settled
enough to read from the track, and the first sector it sees is sector
N+10.

The OS isn’t likely to ask for one sector at a time.

Frequently true, so consider this related scenario. The host requests a
read of 10 sectors starting at sector N. When the head settles, the
next sector is N+6. Without any in drive buffering, it would wait
almost a full revolution till record N comes under the head.

With buffering, but no cache, the drive reads record N+5 to N+9 into the buffer, then waits until the drive rotates to record N and begins the
host transfer. This is an improvement because the transfer to the host
is faster than the transfer from the disk, and the last 3 sectors can be transferred out of the buffer without waiting for the disk, so the
transfer is completed faster.

Now consider with caching. Similar, but after record N+9, the drive
continues reading into the cache. Lets say there are 30 records on this
track. If it reads all of the data into the cache, then proceeds as
above once the disk rotates to record N, it has cost zero time, and if
the host then issues another 10 sector read sequential to the initial
one (or actually any sectors from N+10 to N+29). This can be satisfied
out of the cache without any drive delay, so much faster than without
the cache, and the heads can be moved away to start satisfying another unrelated request. There is minimal cost and substantial benefit.

Now you have argued that the file system cache should take care of that, presumably issuing prefetch reads for the next sectors. This will work,
of course, but has some disadvantages relative to using the drive cache.
Specifically,since it is unlikely the prefetch request will be
received by the drive before record N+10 has passed the heads, it will
incur additional most of a rotational delay, which will tie up the
drive, preventing it from responding to some other request.

No one is arguing that host based file caches are bad. It is simply the
fact that there are situations where drive caches are a useful addition,
and since the drive has to have some DRAM anyway for other reasons, the
cost is minimal. You can think of the drive cache as the "next level"
cache behind the host based cache.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Lawrence D'Oliveiro on Tue May 13 12:55:46 2025

Lawrence D'Oliveiro wrote:

On Mon, 12 May 2025 21:50:02 -0400, Stefan Monnier wrote:

But the drive also knows things about the data that the OS can't know
(things that have to do with the physical location of the data on the
platters).

No it doesn’t.

Yes the HDD controller does know more.
For 40 or so years HDD have done automatic bad sector and bad track replacement. Each track has a few spare sectors and each zone
has a number of spare tracks.
They also use zoned recording where the number of
sectors per track changes as you move out from the center.
So you won't know which sectors are on the same track.
Also as Anton mentioned rotational optimization where you write
logical sectors 1,2,3,4 but they are actually written 3,4,<wait>,1,2,
or even 3,4,<wait>,2,<seek>,1.

The HDD controller is responsible for mapping the logical sector
numbers onto the real physical disk sectors using those internal
maps and then optimizing the head seeks.

Consider a journalling filesystem, where the journal entry
must be written first before the actual filesystem update. If the drive reorders the writes to do the latter first, there go your filesystem integrity guarantees.

For those drives that accept multiple queued commands,
they are _defined_ as being reordered by the drive controller.
If you don't want two commands reordered then submit them serially
and wait for each completion status.

File system integrity has to take many failure modes into account.
E.g. sectors are 512B but a file system block is 4kB or 8 sectors.
On power fail the drive finishes writing only its current sector
with a valid checksum. Only part of the block was written but its
sector checksums are all correct. File systems can add block checksums
to their meta data blocks to catch this.

To allow recovery from power fail or crash, which parts of the file
system IO can be overlapped in parallel and which must be serialized
is part of resilient file system and database design.

More reliable, I don't know: to get comparable performance, you'll need
comparable complexity ...

Fewer layers ⇒ less complexity.

Modularity => less complexity due to less interactions between components. Fewer layers => more interactions => more permutations and combinations.

SSD's each have a different internal design.
Your approach would require the OS to know about the internal details
of every flash chip down to the chip rev level for every manufacturer.

- Those "few filesystems" aren't nearly good enough to compete with
a normal filesystem running on top of a typical SSD. Simply because
those filesystems have not been designed for those kinds of uses.

Of course they are. You don’t think there are filesystem experts working
on the Linux kernel?

An area you may be thinking of, where having the file system directly
control the low level SSD might benefit, is putting a log structured
file system directly onto an SSD.

Currently these have the SSD's Flash Translation Layer FTL doing work to
make it appear as though physically discontiguous blocks look logically contiguous with FTL blocks storing all that mapping meta data,
then the log structured file system works to map those logically
contiguous blocks scatter back to their associated files.

This would probably be simpler with just one software layer,
but that would require either the file system know about flash details,
or flash have a full file system.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Wed May 14 00:18:02 2025

On Tue, 13 May 2025 08:33:15 -0700, Stephen Fuld wrote:

No one is arguing that host based file caches are bad. It is simply the
fact that there are situations where drive caches are a useful
addition ...

You can tell that’s wrong because the drive cache is slower than the OS filesystem cache. Putting a slower cache in series with a faster one is a
waste of time ... unless the slower cache is much larger.

This is why, for example, we typically have 3 levels of RAM cache between
the CPU and main memory these days. There is a factor of about 100:1 in relative speeds, so to bridge the gap we need multiple caches of various intermediate speeds, and you will notice their sizes are inversely related
to their speeds.

A drive cache can never be as big as main RAM on a modern PC. That’s why
the drive cache is useless.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Tue May 13 17:49:17 2025

On 5/13/2025 5:18 PM, Lawrence D'Oliveiro wrote:

On Tue, 13 May 2025 08:33:15 -0700, Stephen Fuld wrote:

No one is arguing that host based file caches are bad. It is simply the
fact that there are situations where drive caches are a useful
addition ...

You can tell that’s wrong because the drive cache is slower than the OS filesystem cache.

I don't think that proves anything. I gave you an example of where the
drive cache speeded up the sequence of two reads to where it was faster
than two reads into the file cache.

Putting a slower cache in series with a faster one is a

waste of time ... unless the slower cache is much larger.

See my counter example posted earlier.

This is why, for example, we typically have 3 levels of RAM cache between
the CPU and main memory these days. There is a factor of about 100:1 in relative speeds, so to bridge the gap we need multiple caches of various intermediate speeds, and you will notice their sizes are inversely related
to their speeds.

A drive cache can never be as big as main RAM on a modern PC. That’s why the drive cache is useless.

You haven't refuted my example, and besides the comparison you give here
is not meaningful because you can't use all the main ram as a file cache.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed May 14 13:31:50 2025

Lawrence D'Oliveiro <[email protected]d> writes:

On Tue, 13 May 2025 08:33:15 -0700, Stephen Fuld wrote:

No one is arguing that host based file caches are bad. It is simply the
fact that there are situations where drive caches are a useful
addition ...

You can tell that’s wrong because the drive cache is slower than the OS >filesystem cache. Putting a slower cache in series with a faster one is a >waste of time ... unless the slower cache is much larger.

You really have no clue what you're talking about. Disk drive caches were never intended to be extensions of the host cache structure.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to John Levine on Wed May 14 18:12:12 2025

John Levine <[email protected]> writes:

According to Scott Lurndal <[email protected]>:

Lawrence D'Oliveiro <[email protected]d> writes:
[something]

You really have no clue what you're talking about. Disk drive caches were >>never intended to be extensions of the host cache structure.

Er, you do know who you're arguing with?

Yes, I do generally try to avoid feeding the trolls. My bad.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed May 14 18:01:23 2025

According to Scott Lurndal <[email protected]>:

Lawrence D'Oliveiro <[email protected]d> writes:
[something]

You really have no clue what you're talking about. Disk drive caches were >never intended to be extensions of the host cache structure.

Er, you do know who you're arguing with?

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Wed May 14 18:45:09 2025

John Levine <[email protected]> schrieb:

According to Scott Lurndal <[email protected]>:

Lawrence D'Oliveiro <[email protected]d> writes:
[something]

You really have no clue what you're talking about. Disk drive caches were >>never intended to be extensions of the host cache structure.

Er, you do know who you're arguing with?

I have adjusted my scorefile so I only get to read the intelligent
parts of these conversations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Sat May 17 23:57:50 2025

On Mon, 12 May 2025 22:35:57 -0000 (UTC), I wrote:

In other words, telling lies to the OS that the write has completed
when it hasn’t. This kind of thing can really stuff up filesystem
integrity guarantees.

What do you know, it turns out there is this feature, originally from
SCSI and part of NCQ, called “Force Unit Access” (FUA) <https://en.wikipedia.org/wiki/Disk_buffer#FUA>.

But do we really know that the drive vendors will honour this command
... ?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Sun May 18 01:35:54 2025

On Tue, 13 May 2025 17:49:17 -0700, Stephen Fuld wrote:

On 5/13/2025 5:18 PM, Lawrence D'Oliveiro wrote:

You can tell that’s wrong because the drive cache is slower than the OS
filesystem cache.

I don't think that proves anything. I gave you an example of where the
drive cache speeded up the sequence of two reads to where it was faster
than two reads into the file cache.

Think of what a cache is for in the first place. The only reason they work
is because of the “principle of locality”. This can also be expressed as saying that typical patterns of data access by application programs follow
a Pareto distribution, less formally known by monikers like the “80/20 rule” or the “90/10 rule”.

Let’s use the “90/10 rule” name just to put some concrete numbers on the whole idea: this says that 90% of (recent) data accesses will be to just
10% of the data.

For “recent”, let’s say “within the last 10 seconds”. So the OS cache should be big enough to hold that 10% of the data that the app accessed in about the last 10 seconds. Of course the precise set of frequently-
accessed data (the “working set”) will drift over time. So anything beyond this last-10-seconds’ worth will require hitting the actual disk device, which will happen eventually.

At that point, the cache on the disk device is going to come into play.
Trouble is it’s nowhere near this kind of size: it can only hold enough
data for, say, about the last 1 second’s worth of accesses. So by the time the OS cache experiences a miss like this, the disk cache isn’t going to
be able to help much, since it is more likely than not going to miss as
well. To be useful, it needs to be about 10 times the size of the OS
cache, so it can hold more of the 90% of the data that the app wants to
access less frequently.

But you can’t get drives like that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Lawrence D'Oliveiro on Sun May 18 14:10:00 2025

Lawrence D'Oliveiro <[email protected]d> writes:

Think of what a cache is for in the first place. The only reason they work
is because of the “principle of locality”. This can also be expressed as saying that typical patterns of data access by application programs follow
a Pareto distribution, less formally known by monikers like the “80/20 rule” or the “90/10 rule”.

IBM "added" full-track "-13" cache to 3880 dasd control for 3380 disk
(ten records/track) ... claiming 90% "hit rate". Issue was that there
was a lot of sequential file reading ... the 1st record read for track
would be a "miss" but bring in the whole track, resulting in the next
nine reads being "hits".

system services offered option for application doing sequential i/o to
specify full-track i/o (into processor memory) ... which would result
in the zero hit rate for the controller cache (IBM standard batch
operating system did contiguous allocation on file creation).

About the same time, we did system mod. that did highly efficient
trace/capture of every record operation which was deployed on numerous production systems. Then traces were fed to sophisticated simulator that
could vary algorithms, caches, kinds of caches, sizes of caches,
distribution of caches, etc.

Given a fixed amount of cache storage, it was always better to have a
global system cache ... than partitioned/distributed; except a few edge
cases. Example, if device track cache could be used to immediately
start transfering data, rather having to rotate to start of track before starting transfer.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Mon May 19 00:18:23 2025

On Sun, 18 May 2025 1:35:54 +0000, Lawrence D'Oliveiro wrote:

On Tue, 13 May 2025 17:49:17 -0700, Stephen Fuld wrote:

On 5/13/2025 5:18 PM, Lawrence D'Oliveiro wrote:

You can tell that’s wrong because the drive cache is slower than the OS >>> filesystem cache.

I don't think that proves anything. I gave you an example of where the
drive cache speeded up the sequence of two reads to where it was faster
than two reads into the file cache.

Think of what a cache is for in the first place. The only reason they
work
is because of the “principle of locality”. This can also be expressed as saying that typical patterns of data access by application programs
follow
a Pareto distribution, less formally known by monikers like the “80/20 rule” or the “90/10 rule”.

Let’s use the “90/10 rule” name just to put some concrete numbers on the
whole idea: this says that 90% of (recent) data accesses will be to just
10% of the data.

For “recent”, let’s say “within the last 10 seconds”.

Principle is reasonable, specified time is not.

A 4MB cache can be filled in 1ms when memory performs at (only) 4GB/s
(and the memory systems are up in the 40GB/s peak territory while
caches remain rather small (several GB at most).)

So the OS cache
should be big enough to hold that 10% of the data that the app accessed
in
about the last 10 seconds. Of course the precise set of frequently-
accessed data (the “working set”) will drift over time. So anything beyond
this last-10-seconds’ worth will require hitting the actual disk device, which will happen eventually.

Just change 10 seconds to 10 milliseconds and I have nothing to complain
about.

At that point, the cache on the disk device is going to come into play. Trouble is it’s nowhere near this kind of size: it can only hold enough data for, say, about the last 1 second’s worth of accesses. So by the
time
the OS cache experiences a miss like this, the disk cache isn’t going to
be able to help much, since it is more likely than not going to miss as
well. To be useful, it needs to be about 10 times the size of the OS
cache, so it can hold more of the 90% of the data that the app wants to access less frequently.

But you can’t get drives like that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Mon May 19 01:13:01 2025

On Mon, 19 May 2025 00:18:23 +0000, MitchAlsup1 wrote:

On Sun, 18 May 2025 1:35:54 +0000, Lawrence D'Oliveiro wrote:

For “recent”, let’s say “within the last 10 seconds”.

Principle is reasonable, specified time is not.

A 4MB cache can be filled in 1ms when memory performs at (only) 4GB/s
(and the memory systems are up in the 40GB/s peak territory while caches remain rather small (several GB at most).)

Filesystem cache turnover happens in times on the order of seconds, not milliseconds.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lynn Wheeler on Sun May 18 18:41:38 2025

On 5/18/2025 5:10 PM, Lynn Wheeler wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

Think of what a cache is for in the first place. The only reason they work >> is because of the “principle of locality”. This can also be expressed as >> saying that typical patterns of data access by application programs follow >> a Pareto distribution, less formally known by monikers like the “80/20
rule” or the “90/10 rule”.

IBM "added" full-track "-13" cache to 3880 dasd control for 3380 disk
(ten records/track) ... claiming 90% "hit rate". Issue was that there
was a lot of sequential file reading ... the 1st record read for track
would be a "miss" but bring in the whole track, resulting in the next
nine reads being "hits".

system services offered option for application doing sequential i/o to specify full-track i/o (into processor memory) ... which would result
in the zero hit rate for the controller cache (IBM standard batch
operating system did contiguous allocation on file creation).

About the same time, we did system mod. that did highly efficient trace/capture of every record operation which was deployed on numerous production systems. Then traces were fed to sophisticated simulator that could vary algorithms, caches, kinds of caches, sizes of caches,
distribution of caches, etc.

Given a fixed amount of cache storage, it was always better to have a
global system cache ... than partitioned/distributed; except a few edge cases. Example, if device track cache could be used to immediately
start transfering data, rather having to rotate to start of track before starting transfer.

It didn't make sense to, after executing the full track read, since the
disk was positioned at the end of the track, and you had a good
indication that it was a sequential file read, to start caching the next
track in anticipation of the next full track read?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Stephen Fuld on Mon May 19 21:46:15 2025

On 19/05/2025 02:41, Stephen Fuld wrote:

It didn't make sense to, after executing the full track read, since the
disk was positioned at the end of the track, and you had a good
indication that it was a sequential file read, to start caching the next track in anticipation of the next full track read?

It might if the disk was idle. But if there is a queue from another
process it probably will be better to do that instead.

Andy

--
Do not listen to rumour, but, if you do, do not believe it.
Ghandi.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Vir Campestris on Mon May 19 14:58:48 2025

On 5/19/2025 1:46 PM, Vir Campestris wrote:

On 19/05/2025 02:41, Stephen Fuld wrote:

It didn't make sense to, after executing the full track read, since
the disk was positioned at the end of the track, and you had a good
indication that it was a sequential file read, to start caching the
next track in anticipation of the next full track read?

It might if the disk was idle. But if there is a queue from another
process it probably will be better to do that instead.

I presume you know that the 3880 controller did not do what today we
call command queuing, so I think you were referring to a potential queue
in the host. That being the case, the controller doesn't know if there
is a queue or not. So given that, why not start reading record 1 on the
next track. If a request comes in, you can abandon the read to service
the request - no harm, no foul. If there isn't, and you subsequently
get a request for that track, it's a big win. The only potential loss
is if you get a request for the track that was LRU and got pushed out of
the cache.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Mon May 19 23:33:42 2025

On Mon, 19 May 2025 1:13:01 +0000, Lawrence D'Oliveiro wrote:

On Mon, 19 May 2025 00:18:23 +0000, MitchAlsup1 wrote:

On Sun, 18 May 2025 1:35:54 +0000, Lawrence D'Oliveiro wrote:

For “recent”, let’s say “within the last 10 seconds”.

Principle is reasonable, specified time is not.

A 4MB cache can be filled in 1ms when memory performs at (only) 4GB/s
(and the memory systems are up in the 40GB/s peak territory while caches
remain rather small (several GB at most).)

Filesystem cache turnover happens in times on the order of seconds, not milliseconds.

Yes, running at the latency and bandwidth of the disk/SSD subsystem
not of the DRAM system.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Tue May 20 00:36:11 2025

On Mon, 19 May 2025 23:33:42 +0000, MitchAlsup1 wrote:

On Mon, 19 May 2025 1:13:01 +0000, Lawrence D'Oliveiro wrote:

Filesystem cache turnover happens in times on the order of seconds, not
milliseconds.

Yes, running at the latency and bandwidth of the disk/SSD subsystem not
of the DRAM system.

It’s not really about latency, it’s just about the way the filesystem caches are used. They quite typically take up a big chunk of the RAM on a server, and stuff does tend to hang around in them for several seconds, to
get good cache hits. (Though on Linux that’s considered a low-priority
use, so if a regular application needs more RAM and there isn’t enough
free, then the filesystem cache will be quickly flushed as necessary to
make more available.)

And consider that modern high-performance servers are available with
terabytes of RAM. Fill a few tens of % of that with filesystem cache, and
you see why disk caches don’t stand a chance.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to quadibloc on Tue May 20 00:43:41 2025

On Mon, 19 May 2025 21:33:11 +0000, quadibloc wrote:

Parallel programming isn't "hard" or "easy".

Concurrent processes/threads, running either fully concurrently on
separate processors, or with asynchronous preemption on shared processors,
are hard to program. That’s where the term “heisenbug” came from, where you had subtle, intermittent, timing-related misbehaviour that, for
example, tended to go away when you changed the code to try to narrow the problem down.

In the 1990s, multithreading became popular, and people wanted to use it
for everything -- even GUIs. That turned out to be a bad idea. So nowadays
we recognize that threads are mainly useful to increase performance for CPU-bound code, and not much else.

We also now have the async/await paradigm, also known as “stackless coroutines”. Like threads, this lets us manage multiple concurrent
activities at once. Unlike threads, preemption is always explicit, which
gets rid of most possibilities for race conditions. This technique is very useful where the performance bottleneck lies elsewhere than the CPU: i.e.
the filesystem, network, or just waiting for the user to click the mouse
or press the next key.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Stephen Fuld on Tue May 20 11:22:26 2025

On 19/05/2025 22:58, Stephen Fuld wrote:

On 5/19/2025 1:46 PM, Vir Campestris wrote:

On 19/05/2025 02:41, Stephen Fuld wrote:

It didn't make sense to, after executing the full track read, since
the disk was positioned at the end of the track, and you had a good
indication that it was a sequential file read, to start caching the
next track in anticipation of the next full track read?

It might if the disk was idle. But if there is a queue from another
process it probably will be better to do that instead.

I presume you know that the 3880 controller did not do what today we
call command queuing, so I think you were referring to a potential queue
in the host. That being the case, the controller doesn't know if there
is a queue or not. So given that, why not start reading record 1 on the next track. If a request comes in, you can abandon the read to service
the request - no harm, no foul. If there isn't, and you subsequently
get a request for that track, it's a big win. The only potential loss
is if you get a request for the track that was LRU and got pushed out of
the cache.

The only IBM machine I've ever even touch was a PC/XT - so no, I didn't
know. That makes sense.

My mainframe background goes back to the ICL 2900 at the beginning of
the '80s. There were two families of disc controllers, one of which did
clever stuff like retries for the host (I don't remember if it cached)
and the other one was dumb and cheap. The dumb one was becoming more
popular at the time when I discovered the realities of working in tech
and had practice in CV writing...

Andy

--
Do not listen to rumour, but, if you do, do not believe it.
Ghandi.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue May 20 13:53:52 2025

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 19 May 2025 21:33:11 +0000, quadibloc wrote:

Parallel programming isn't "hard" or "easy".

Concurrent processes/threads, running either fully concurrently on
separate processors, or with asynchronous preemption on shared processors, >are hard to program.

Clearly you are someone who has never done it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Tue May 20 10:49:54 2025

You need RAM to make this work, and a few MB of HDD side cache isn't a huge cost if one would have already needed the same stuff to make the HDD work.

Indeed, AFAIK, what we call "HDD cache" is actually just the RAM used
by the embedded CPU inside the drive for its operation.
I expect this is used to store the information about in-flight requests
(e.g. most importantly the data about the write requests received but
that haven't yet reached the platters), but I also expects it holds data
that happened to fly recently by the read-head, just in case.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to BGB on Tue May 20 11:35:15 2025

On 5/20/2025 10:42 AM, BGB wrote:

On 5/20/2025 9:49 AM, Stefan Monnier wrote:

You need RAM to make this work, and a few MB of HDD side cache isn't
a huge
cost if one would have already needed the same stuff to make the HDD
work.

Indeed, AFAIK, what we call "HDD cache" is actually just the RAM used
by the embedded CPU inside the drive for its operation.
I expect this is used to store the information about in-flight requests
(e.g. most importantly the data about the write requests received but
that haven't yet reached the platters), but I also expects it holds data
that happened to fly recently by the read-head, just in case.

Probably.

Again, a caveat that my knowledge is pretty old, and may have been
superseded.

The drive certainly needs some DRAM for its internal operations, but as
has been pointed out, the cost of making this larger and using that
extra for cache is pretty small. I just checked and a current Seagate
drive has 512 MB of cache. That cache is used on both reads and writes.

For reads, the vendor can specify the max amount to prefetch, which then determines the number of segments the cache is divided into. i.e. you
might not want to cache the rest of the track since that might force not keeping the data from a previous read. You can also specify the minimum
to prefetch, which can be zero, which determines how soon the drive can
move the heads for another request.

For writes, even if no write caching is allowed, the DRAM is used to
accept the write data before the heads have arrived at the requested
track and the disk has spun to the required sector. This allows for a
faster response in the case that the heads arrive on track in the
"middle" of the requested write area so the drive can write the last
part of the transfer to the disk before the first part. (i.e. you
overlap the time to transfer the last part of the request to the disk
with the rotation.) Of course, if write caching is enabled, the DRAM is
used to hold the write data until it can be written to the disk.

As I understand it, it is this, along with a certain amount of "read prefetch", which is granted, typically the data for the rest of the
track as the drive spins around;

Perhaps. See above.

And, keeping some copies of previously read content around, which can be
read again from this cache if they happen to be requested.

As I understand it, also more modern HDDs tend to be "density per area" rather than angular slices (as it was on much older HDDs and floppies,
*), so there would be more sectors on outer tracks vs on inner tracks.

This has been done for well over 30 years. It allows the disk capacity
to be increased by about 1/3 without any increased cost of the disk,
hence lowers cost per gigabyte of the disk. Of course it requires an interface, such as, originally SCSI, but now also SATA that is block
oriented not cyl/head/record oriented.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Tue May 20 16:06:13 2025

Personally, I rarely use multi-threading, and when I do, it is usually in
the form of using mutex locks over shared buffers.
You lock the mutex if needed to copy data from one thread to another; or
when doing a task that depends on the data being consistent.

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stefan Monnier on Tue May 20 22:11:41 2025

On Tue, 20 May 2025 20:06:13 +0000, Stefan Monnier wrote:

Personally, I rarely use multi-threading, and when I do, it is usually
in
the form of using mutex locks over shared buffers.
You lock the mutex if needed to copy data from one thread to another; or
when doing a task that depends on the data being consistent.

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

Also note: I need to use the "affinity" service under taskMangler
so that only 4 processes share a core, performance goes up by roughly
20%.

The CFD simulations typically run for 10 hours (each, wall clock time),
and every second; dump out a dozen KB of compressed data (each). The
disk load is low enough I can do other "stuff" on the computer as
long as I affinitize the heavy loads to 5 of the 6 cores.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Schultz@21:1/5 to All on Tue May 20 19:34:20 2025

On 5/20/25 5:11 PM, MitchAlsup1 wrote:

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

Reminds me of something from the 90's. The computer science department
had a UNIX machine built around many 80386 CPUs. I was taking a course
on Internet programming and the best editor available was vi. Ugh.

So I downloaded the sources for emacs. Giving make the option to run
multiple processes improved the build speed a lot.

--
http://davesrocketworks.com
David Schultz
"The cheaper the crook, the gaudier the patter." - Sam Spade

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stefan Monnier on Wed May 21 00:29:46 2025

On Tue, 20 May 2025 10:49:54 -0400, Stefan Monnier wrote:

Indeed, AFAIK, what we call "HDD cache" is actually just the RAM used by
the embedded CPU inside the drive for its operation.

If it were just I/O buffers for operations in progress, that would be
fine. The problem is when it keeps data around instead of immediately
writing it out, and what’s worse, lies about it, so it tells the OS that
the write has completed when it hasn’t.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Wed May 21 00:34:24 2025

On Tue, 20 May 2025 22:11:41 +0000, MitchAlsup1 wrote:

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

I guess that depends on what your licence server says. ;)

(Unless it’s open-source software, of course ...)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Wed May 21 00:57:25 2025

On Tue, 20 May 2025 00:16:44 -0500, BGB wrote:

Also the SATA interface is technically faster and has a higher bandwidth
than it can actually read stuff from the HDD itself, so if the HDD's
cache can do *anything* (in terms of prefetch and allowing IO requests
to be completed more quickly) it is still a win.

But that depends on the disk cache having something useful in it. Which,
as we have seen from its place in the hierarchy and the basic statistics
of cache operation, is pretty unlikely.

The cost/benefit balance is negative: it contributes essentially nothing
to performance, while detracting to some extent from reliability.

Well, say, because the FS designers seem to casually assume smaller
numbers of multi-MB files, rather than people filling the drive with
millions of files most of which being kB sized.

Filesystem designers have solutions for that. Look up “tail-packing”.

This is why drive designers should not be second-guessing OS filesystem designers.

The caches serve different purposes.

I think drive designers suffer from an inferiority complex: they know
their product is right at the bottom of the performance hierarchy, so they
try to come up with ways to make them seem faster than they are, instead
of concentrating on their number one job, which is ensuring the integrity
of the data they’re entrusted with storing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Schultz on Wed May 21 01:00:24 2025

On Tue, 20 May 2025 19:34:20 -0500, David Schultz wrote:

Giving make the option to run multiple processes improved the build
speed a lot.

The fun thing about “make -j«nrprocesses»” is you can omit the «nrprocesses» and it will create as many processes as it can find an
excuse for.

Try this with something decently-sized, like an FFmpeg build, and watch
your system come to its knees. ;)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stefan Monnier on Wed May 21 00:32:45 2025

On Tue, 20 May 2025 16:06:13 -0400, Stefan Monnier wrote:

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

I have no idea what the difference is supposed to be; as far as I’m concerned, they’re synonyms.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to All on Tue May 20 21:30:31 2025

On Tue, 20 May 2025 22:11:41 +0000, [email protected] (MitchAlsup1)
wrote:

On Tue, 20 May 2025 20:06:13 +0000, Stefan Monnier wrote:

Personally, I rarely use multi-threading, and when I do, it is usually
in
the form of using mutex locks over shared buffers.
You lock the mutex if needed to copy data from one thread to another; or >>> when doing a task that depends on the data being consistent.

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

Processes on the same core are concurrent - processes on different
cores are parallel.

Also note: I need to use the "affinity" service under taskMangler
so that only 4 processes share a core, performance goes up by roughly
20%.

The CFD simulations typically run for 10 hours (each, wall clock time),
and every second; dump out a dozen KB of compressed data (each). The
disk load is low enough I can do other "stuff" on the computer as
long as I affinitize the heavy loads to 5 of the 6 cores.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Stephen Fuld on Tue May 20 16:38:31 2025

Stephen Fuld <[email protected]d> writes:

I presume you know that the 3880 controller did not do what today we
call command queuing, so I think you were referring to a potential
queue in the host. That being the case, the controller doesn't know
if there is a queue or not. So given that, why not start reading
record 1 on the next track. If a request comes in, you can abandon
the read to service the request - no harm, no foul. If there isn't,
and you subsequently get a request for that track, it's a big win.
The only potential loss is if you get a request for the track that was
LRU and got pushed out of the cache.

over optimizing full track read ahead could lock out other tasks that
had competing requirements for other parts of the disk.

trivia: early 70s, IBM decided to add virtual memory to all 370s. Early
last decade I was asked to tract down the decsion. I found staff member
to executive making the decision. Basically MVT (IBM's high end, major
batch system) storage management was so bad that (multiprogramming)
region sizes had to be specified four times larger than used, as a
result typical (high-end) 1mbyte 370/165 only ran four regions
concurrently, insufficient to keep system busy and justified. Running
MVT in a 16mbyte virtual address space (sort of like running MVT in CP67 16mbyte virtual machine) would allow concurrent regions to be increased
by factor of four times (caped at 15 because of 4bit storage protect
key) with little or no paging. Later as high-end systems got larger,
they needed more than 15 concurrent running regions ... and so switched
from VS2/SVS (single 16mbyte virtual address space) to VS2/MVS (a
separate 16mbyte virtual address space for each "region", went through MVT->VS2/SVS->VS2/MVS)

along the way, I had been pontificating that DASD (disks) relative
system throughput has been decreasing ... in 1st part of 80s, I turned
out analysis that in the 15yr period since the IBM 360 1st ships,
DASD/disk relative system throughput had declined by an order of
magnitude (i.e. DASD got 4-5 times faster while systems got 40-50 times faster). Some DASD division executive took exception and assigned the
division performance group to refute the claim ... after a few weeks,
they came back and bascially said I had slightly understated the
issue. The performance group then respun the analysis for user group presenation on how to configure disks and filesystem to improve system throughput (SHARE63, B874, 16Aug1984).

1970 IBM 2305 fixed-head disk controller supported 8 separate psuedo
device addresses ("multiple exposure") for each 2305 disk ... each
having channel program that the controller could optimize. In 1975, I
was asked to help enhance low-end 370 that had integrated channels and integrated device controllers ... and I wanted to upgrade microcode so I
just update a queue of channel programs that the (integrated microcode) controller could optimize (wasn't allowed to ship the product).

Later I wanted to add "multiple exposure" support to 3830 (precursor to
the 3880) for 3350 (moveable arm) disks (IBM east coast group was
working on emulated electronic memory disks, considered it might compete
and got it vetoed. sometime later they got shutdown, they were told IBM
was selling all electronic memory it could make as higher markup
processor memory).

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Chris M. Thomasson on Wed May 21 03:41:45 2025

On Tue, 20 May 2025 18:39:54 -0700, Chris M. Thomasson wrote:

Processes on the same core are concurrent - processes on different
cores are parallel.

Only if the cores and/or "hardware threads" do not interfere with one another?

That’s why I think the distinction is meaningless.

Yes, there is a valid distinction to be made between truly concurrent/
parallel processes/threads, and ones which are made to appear so by
preemptive scheduling on shared hardware.

Not sure if there is a good term for the latter: “timesliced”? “timeshared”? “pseudoconcurrent”? “pseudoparallel”?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Wed May 21 03:46:44 2025

On Tue, 20 May 2025 20:08:28 -0500, BGB wrote:

On 5/20/2025 7:29 PM, Lawrence D'Oliveiro wrote:

If it were just I/O buffers for operations in progress, that would be
fine. The problem is when it keeps data around instead of immediately
writing it out, and what’s worse, lies about it, so it tells the OS
that the write has completed when it hasn’t.

Note that (with SATA and similar) the OS can request that the drive
flush its caches, and (in theory) drive should not respond to more
requests until everything has been fully written back to disk.

I mentioned elsewhere that a special function was added that was supposed
to mean “really flush your caches dammit”.

But there is still no way to tell that the drive really does what you
demand that it do, and isn’t still lying about it ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lynn Wheeler on Tue May 20 20:49:25 2025

On 5/20/2025 7:38 PM, Lynn Wheeler wrote:

Stephen Fuld <[email protected]d> writes:

I presume you know that the 3880 controller did not do what today we
call command queuing, so I think you were referring to a potential
queue in the host. That being the case, the controller doesn't know
if there is a queue or not. So given that, why not start reading
record 1 on the next track. If a request comes in, you can abandon
the read to service the request - no harm, no foul. If there isn't,
and you subsequently get a request for that track, it's a big win.
The only potential loss is if you get a request for the track that was
LRU and got pushed out of the cache.

over optimizing full track read ahead could lock out other tasks that
had competing requirements for other parts of the disk.

Sure. That is why I asked if, with all the traces you had collected,
you determined whether it was worthwhile to do it.

snipped some stories.

I just want to add that I, for one, enjoy your stories of your time with
IBM, and all the internal struggles, both technological and "political"
the company went through.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Tue May 20 20:54:08 2025

On 5/20/2025 5:29 PM, Lawrence D'Oliveiro wrote:

On Tue, 20 May 2025 10:49:54 -0400, Stefan Monnier wrote:

Indeed, AFAIK, what we call "HDD cache" is actually just the RAM used by
the embedded CPU inside the drive for its operation.

If it were just I/O buffers for operations in progress, that would be
fine. The problem is when it keeps data around instead of immediately
writing it out, and what’s worse, lies about it, so it tells the OS that the write has completed when it hasn’t.

So you have no objection to caching data on reads such as reading in
subsequent data. And furthermore, you saw earlier that the host can
turn on or off, either for all writes or a particular write, the
completion return before the data is on the disk. No one is "lying".
You just have to understand what you are asking for.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Tue May 20 20:58:53 2025

On 5/20/2025 8:46 PM, Lawrence D'Oliveiro wrote:

On Tue, 20 May 2025 20:08:28 -0500, BGB wrote:

On 5/20/2025 7:29 PM, Lawrence D'Oliveiro wrote:

If it were just I/O buffers for operations in progress, that would be
fine. The problem is when it keeps data around instead of immediately
writing it out, and what’s worse, lies about it, so it tells the OS
that the write has completed when it hasn’t.

Note that (with SATA and similar) the OS can request that the drive
flush its caches, and (in theory) drive should not respond to more
requests until everything has been fully written back to disk.

I mentioned elsewhere that a special function was added that was supposed
to mean “really flush your caches dammit”.

But there is still no way to tell that the drive really does what you
demand that it do, and isn’t still lying about it ...

Sure there is. Just do a small write to a random location and time it.
repeat several times to assure consistent results.

Besides, if a disk vendor was foolish enough to not follow the spec and
not document that fact, customers would soon find out and that would
ruin that vendor's reputation. They wouldn't risk it.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Wed May 21 05:39:25 2025

On Tue, 20 May 2025 20:54:08 -0700, Stephen Fuld wrote:

And furthermore, you saw earlier that the host can turn on or off,
either for all writes or a particular write, the completion return
before the data is on the disk.

The host can *request* such a thing, nothing more. And of course the drive
can answer in the affirmative.

Whether a suitable action actually takes place is another matter entirely.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Tue May 20 23:42:43 2025

On 5/20/2025 10:39 PM, Lawrence D'Oliveiro wrote:

On Tue, 20 May 2025 20:54:08 -0700, Stephen Fuld wrote:

And furthermore, you saw earlier that the host can turn on or off,
either for all writes or a particular write, the completion return
before the data is on the disk.

The host can *request* such a thing, nothing more. And of course the drive can answer in the affirmative.

Whether a suitable action actually takes place is another matter entirely.

See my response to this in a previous post.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to [email protected] on Wed May 21 07:05:38 2025

[email protected] (MitchAlsup1) writes:

On Tue, 20 May 2025 20:06:13 +0000, Stefan Monnier wrote:

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

That's an example of an embarrasingly parallel problem, and you can
run it on a machine with 20 processors or 20 machines with 1 processor
each without needing any communication between the processes.

Concurrency is concerned with the coordination of simultaneous
processes (or threads) that need to communicate, even if it is just to coordinate the access to a shared resource. The classical example is
how to coordinate the simulteous access to a bank account. If the
account contains EUR 100 (and allows no overdraft), and two people
want to withdraw EUR 100 from it at the same time, what happens?

Concurrency is an issue already when only a single core is involved
(if the account is checked before the withdrawal happens, and there
can be a task switch to the other withdrawal, the account can be
overdrafted; this is a classic race condition), but becomes harder
when parallel hardware is involved.

I have not read the book, but it seems that a more appropriate title
would be "Is Concurrent Programming Hard ...".

Parallel computing, OTOH seems to be more concerned with building
computers with many processors that execute HPC applications quickly,
and programming the HPC applications such that they make good use of
these computers.

That leaves the question open what HPC applications are. It's
somewhat cyclic: Those that can make good use of parallel
(super-)computers. The non-cyclical part is that those are
applications that are important enough to someone for financing such supercomputers. The example of CFP (computational fluid dynamics) is
maybe _the_ original HPC application: the nuclear weapons laboratories
in the USA were the first customers of supercomputers and parallel
computers.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Wed May 21 08:23:49 2025

Personally, I rarely use multi-threading, and when I do, it is usually
in
the form of using mutex locks over shared buffers.
You lock the mutex if needed to copy data from one thread to another; or >>> when doing a task that depends on the data being consistent.

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

Both: AFAIK the choice of how to divide&spread the data and the work is
in the parallelism camp, while the choice of how to synchronize them is
in the concurrency camp.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stefan Monnier on Wed May 21 14:08:37 2025

On Wed, 21 May 2025 12:23:49 +0000, Stefan Monnier wrote:

Personally, I rarely use multi-threading, and when I do, it is usually >>>> in
the form of using mutex locks over shared buffers.
You lock the mutex if needed to copy data from one thread to another; or >>>> when doing a task that depends on the data being consistent.

FWIW, I think these kinds of things usually fall in the scope of
concurrency rather than parallelism.

When I run 20-copies of a FEM CFD application, each uni-process::
am I running concurrently ?? or in parallel ?? or both ??

Both: AFAIK the choice of how to divide&spread the data and the work is
in the parallelism camp, while the choice of how to synchronize them is
in the concurrency camp.

I think of it as Both--especially when affinity is not used and the
processes contend for CPUs randomly. Then process[a] and process[b]
are running simultaneously (different cores) they are parallel,
when running back-to-back on the same core they are concurrent.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed May 21 13:39:33 2025

Lawrence D'Oliveiro <[email protected]d> writes:

On Tue, 20 May 2025 20:54:08 -0700, Stephen Fuld wrote:

And furthermore, you saw earlier that the host can turn on or off,
either for all writes or a particular write, the completion return
before the data is on the disk.

The host can *request* such a thing, nothing more. And of course the drive >can answer in the affirmative.

Whether a suitable action actually takes place is another matter entirely.

No, it is not. No drive manufacturer would survive a lie.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to [email protected] on Wed May 21 12:09:23 2025

On Wed, 21 May 2025 03:41:45 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

On Tue, 20 May 2025 18:39:54 -0700, Chris M. Thomasson wrote:

Processes on the same core are concurrent - processes on different
cores are parallel.

Only if the cores and/or "hardware threads" do not interfere with one
another?

That’s why I think the distinction is meaningless.

Yes, there is a valid distinction to be made between truly concurrent/ >parallel processes/threads, and ones which are made to appear so by >preemptive scheduling on shared hardware.

Not sure if there is a good term for the latter: “timesliced”? >“timeshared”? “pseudoconcurrent”? “pseudoparallel”?

Yeah ... but recall that the terminology is passed down from the days
of single core, single thread CPUs. The meanings have been bent,
spindled, and mutilated by time, and conflated in general awareness.

Multi-programming meant running multiple programs on a single CPU.
[i.e. "concurrent" programming] Preemptive timeslicing generally was
implied, but the term itself did not distinguish how programs were
separated: e.g., programmatically (purely by software), segmented by
partition, interleaved (with or without VMM protection), etc.

Multi-processing [from multi-processor] meant running multiple
processes in parallel on separate CPUs (remember, single core, single
thread).

Multi-threading meant being able to (pseudo)simultaneously execute
along multiple "paths" within a /single/ process. The term itself did
not distinguish whether this was done concurrently (single CPU) or in
parallel, and also did not distinguish whether scheduling was
preemtive or cooperative.

Multi-tasking was a catch-all term that could mean any combination of
the others.

Nobody cares anymore.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Stephen Fuld on Wed May 21 12:25:04 2025

Stephen Fuld wrote:

On 5/20/2025 8:46 PM, Lawrence D'Oliveiro wrote:

On Tue, 20 May 2025 20:08:28 -0500, BGB wrote:

On 5/20/2025 7:29 PM, Lawrence D'Oliveiro wrote:

If it were just I/O buffers for operations in progress, that would be
fine. The problem is when it keeps data around instead of immediately
writing it out, and what’s worse, lies about it, so it tells the OS
that the write has completed when it hasn’t.

Note that (with SATA and similar) the OS can request that the drive
flush its caches, and (in theory) drive should not respond to more
requests until everything has been fully written back to disk.

I mentioned elsewhere that a special function was added that was supposed
to mean “really flush your caches dammit”.

But there is still no way to tell that the drive really does what you
demand that it do, and isn’t still lying about it ...

Sure there is. Just do a small write to a random location and time it. repeat several times to assure consistent results.

Besides, if a disk vendor was foolish enough to not follow the spec and
not document that fact, customers would soon find out and that would
ruin that vendor's reputation. They wouldn't risk it.

I suspect some of Lawrence's concerns go back to the Win3.1 days,
when HDD could now afford to expand the read/write buffer and improve
their performance stats. Because the Parallel ATA interface was synchronous
one way drives "improved" their performance stats was to lie and send back
a fake ACK to writes until they had enough sectors to make up a whole track.

Yes it risked scrambling the file system if one powered off too quick.
But I gather DOS/Win3.1 FAT file system would do that enough on its own
that the drive wouldn't be blamed.

The PATA spec includes commands to enable/disable read look-ahead feature,
and enable/disable write cache, but doesn't say what they actually do.
Also there are various commands for setting idle mode, standby mode,
power down mode, etc. but none say what happens to the cache.

I suspect that the HDD manufacturers looked at what drive commands Win3.1 issued when you exited and used those to trigger write back of any
pending cached data.

When I installed WinNT 3.1 (beta) in 1992 it came with explicit
instructions that HDD must use write-through caching, which was enabled/disabled by a jumper pin on the drive.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to EricP on Wed May 21 12:47:34 2025

EricP wrote:

Stephen Fuld wrote:

On 5/20/2025 8:46 PM, Lawrence D'Oliveiro wrote:

On Tue, 20 May 2025 20:08:28 -0500, BGB wrote:

On 5/20/2025 7:29 PM, Lawrence D'Oliveiro wrote:

If it were just I/O buffers for operations in progress, that would be >>>>> fine. The problem is when it keeps data around instead of immediately >>>>> writing it out, and what’s worse, lies about it, so it tells the OS >>>>> that the write has completed when it hasn’t.

Note that (with SATA and similar) the OS can request that the drive
flush its caches, and (in theory) drive should not respond to more
requests until everything has been fully written back to disk.

I mentioned elsewhere that a special function was added that was
supposed
to mean “really flush your caches dammit”.

But there is still no way to tell that the drive really does what you
demand that it do, and isn’t still lying about it ...

Sure there is. Just do a small write to a random location and time
it. repeat several times to assure consistent results.

Besides, if a disk vendor was foolish enough to not follow the spec
and not document that fact, customers would soon find out and that
would ruin that vendor's reputation. They wouldn't risk it.

I suspect some of Lawrence's concerns go back to the Win3.1 days,
when HDD could now afford to expand the read/write buffer and improve
their performance stats. Because the Parallel ATA interface was synchronous one way drives "improved" their performance stats was to lie and send back
a fake ACK to writes until they had enough sectors to make up a whole
track.

Yes it risked scrambling the file system if one powered off too quick.
But I gather DOS/Win3.1 FAT file system would do that enough on its own
that the drive wouldn't be blamed.

The PATA spec includes commands to enable/disable read look-ahead feature, and enable/disable write cache, but doesn't say what they actually do.
Also there are various commands for setting idle mode, standby mode,
power down mode, etc. but none say what happens to the cache.

I suspect that the HDD manufacturers looked at what drive commands Win3.1 issued when you exited and used those to trigger write back of any
pending cached data.

When I installed WinNT 3.1 (beta) in 1992 it came with explicit
instructions that HDD must use write-through caching, which was enabled/disabled by a jumper pin on the drive.

A wiki for hobby OS developers says for ATA cache flush that you
have to send an explicit E7 command or you can get bads sectors: http://wiki.osdev.org/ATA_PIO#Cache_Flush

but the page on ATA commands notes that the E7 Cache Flush command
was added for rev ATA-4 which was after the 1994 ATA preliminary
specs I was looking at, which says E7 is Reserved. http://wiki.osdev.org/ATA_Command_Matrix

So there was a period prior to the E7 command where ATA disks must have inferred when to flush their write cache, and in the true tradition of
PC's they probably all did it differently.

Thus the legend was born.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Chris M. Thomasson on Wed May 21 07:06:26 2025

"Chris M. Thomasson" <[email protected]> writes:

Only if the cores and/or "hardware threads" do not interfere with one another? Fwiw, an example of an embarrassingly parallel algorithm is computing the Mandelbrot set. Actually, this reminds me of the "alias" problem with Intel hyper threading in the past.

shortly after graduating and joining IBM, I got roped into helping with hyperthreading the 370/195. It had pipelined, out-of-order execution,
but conditional branches drained the pipeline and most code only ran
system had half rated throughput. Two hardware i-streams ... each
running at half throughput would (might) keep system full throughput.

hardware hyperthreading mentioned in this about Amdahl winning the
battle to make ACS, 360 compatible (folklore it was shutdown because IBM
was concerned that it would advance the state-of-the-art too fast and
IBM would loose control of the market, and Amdahl leaves IBM). https://people.computing.clemson.edu/~mark/acs_end.html

Then decision was made to add virtual memory to all 370s, and it was
decided it would be too difficult to add it to 370/195 and all new 195
activity was shutdown (note operating system for 195 was MVT and its
shared memory multiprocessor support on 360/65MP was only getting
1.2-1.5 throughput of single processor, so running 195 with simulated multiprocessor with two i-streams ... would only be more like .6 times
fully rated throughput (all hardware might be running at 100%, but the
SMP overhead would limit productive throughput); trivia the
multiprocessor overhead continues up throught MVS.

also after joining IBM, one of my hobbies was enhanced production
operating systems for internal datacenters and the online
sales&marketing support HONE systems were early (& long time)
customer. Then with decision to add virtual memory to all 370s, there
was also decision to do VM370 and in the morph of CP67->VM370 a lot of
things were simplified and/or dropped (including multiprocessor
support). I then start adding stuff back into VM370 and initially do multiprocessor support for the HONE 168s so they can add 2nd processor
to all their systems (and managed to get twice single processor
throughput with some cache affinity hacks and other stuff).

In the mid-70s, after Future Systems implodes, http://www.jfsowa.com/computer/memo125.htm https://en.wikipedia.org/wiki/IBM_Future_Systems_project https://people.computing.clemson.edu/~mark/fs.html

I get roped into helping with a 370 16-CPU multiprocessor design. It was
going fine until somebody tells head of POK (high end 370 processors)
that it could be decades before POK's favorite son operating system (now
"MVS") had ("effective") 16-cpu support (POK doesn't ship a 16-CPU
system until after the turn of the century) ... and some of us are
invited to never visit POK again.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to EricP on Wed May 21 17:19:47 2025

EricP <[email protected]> writes:

I suspect some of Lawrence's concerns go back to the Win3.1 days,
when HDD could now afford to expand the read/write buffer and improve
their performance stats. Because the Parallel ATA interface was synchronous >one way drives "improved" their performance stats was to lie and send back
a fake ACK to writes until they had enough sectors to make up a whole track.

AFAIK there was no tagged command queuing in the (P)ATA interface
until pretty late in the game (and then it was not used by Linux
AFAIK). So if the HDD had waited until the sector was on the platter
(and IIRC there was a way to switch the drive to such a synchronous
mode), anything that writes to the HDD would have been extremely slow:

Wait until the sector is found, write the sector, report success, get
the next command that writes the next sector, but now the head is
already past the sector and you have to wait for another rotation.

By contrast, if you return success as soon as the sector is in the
HDD's RAM, the OS can continue sending data, and the HDD can write the
data in any order that it deems appropriate; it does not have to wait
for a complete track or something, it can start writing right away,
and, of course, if there is a sequence of sectors in the cache, write
that sequence in one go.

I have written a test that checks how much HDDs can reorder writes,
and my results are that, for the pattern I tested, there is no bound on
the out-of-orderness that the drives exhibit if you don't ask for a
barrier or sync. You can find the test software and the results of my
testing at

<http://www.complang.tuwien.ac.at/anton/hdtest/>

Yes it risked scrambling the file system if one powered off too quick.
But I gather DOS/Win3.1 FAT file system would do that enough on its own
that the drive wouldn't be blamed.

Actually on MS-DOS the usual way to shut down the system was to turn
the computer off, or in case of floppy disks to just take them out of
the drive after the light has gone out, and the amount of breakage on
the file system level was not big. I expect that the file system
synced pretty obsessively to achieve that.

When I installed WinNT 3.1 (beta) in 1992 it came with explicit
instructions that HDD must use write-through caching, which was >enabled/disabled by a jumper pin on the drive.

SCSI or PATA?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Anton Ertl on Wed May 21 15:06:27 2025

Anton Ertl wrote:

EricP <[email protected]> writes:

I suspect some of Lawrence's concerns go back to the Win3.1 days,
when HDD could now afford to expand the read/write buffer and improve
their performance stats. Because the Parallel ATA interface was synchronous >> one way drives "improved" their performance stats was to lie and send back >> a fake ACK to writes until they had enough sectors to make up a whole track.

AFAIK there was no tagged command queuing in the (P)ATA interface
until pretty late in the game (and then it was not used by Linux
AFAIK). So if the HDD had waited until the sector was on the platter
(and IIRC there was a way to switch the drive to such a synchronous
mode), anything that writes to the HDD would have been extremely slow:

Wait until the sector is found, write the sector, report success, get
the next command that writes the next sector, but now the head is
already past the sector and you have to wait for another rotation.

By contrast, if you return success as soon as the sector is in the
HDD's RAM, the OS can continue sending data, and the HDD can write the
data in any order that it deems appropriate; it does not have to wait
for a complete track or something, it can start writing right away,
and, of course, if there is a sequence of sectors in the cache, write
that sequence in one go.

Which would improve performance for Win3.1, but for WinNT which had a
real file system with its own file cache then the synchronous writes
would not hurt performance and it gets better reliability.

I have written a test that checks how much HDDs can reorder writes,
and my results are that, for the pattern I tested, there is no bound on
the out-of-orderness that the drives exhibit if you don't ask for a
barrier or sync. You can find the test software and the results of my testing at

<http://www.complang.tuwien.ac.at/anton/hdtest/>

Yes it risked scrambling the file system if one powered off too quick.
But I gather DOS/Win3.1 FAT file system would do that enough on its own
that the drive wouldn't be blamed.

Actually on MS-DOS the usual way to shut down the system was to turn
the computer off, or in case of floppy disks to just take them out of
the drive after the light has gone out, and the amount of breakage on
the file system level was not big. I expect that the file system
synced pretty obsessively to achieve that.

When I installed WinNT 3.1 (beta) in 1992 it came with explicit
instructions that HDD must use write-through caching, which was
enabled/disabled by a jumper pin on the drive.

SCSI or PATA?

- anton

IIRC it needed a SCSI board for the CD-ROM drive (I don't remember why,
I just remember being miffed that I had to buy one) but could use IDE HDD.
As the SCSI HDD were the same drives but with a SCSI interface were
$100 more expensive, I would have used the IDE.

WinNT was developed on MIPS R4000 and ported to 80386/486 later.
Those MIPS systems used all SCSI so maybe they hadn't had time to port
the CD-ROM drivers to IDE but had ported the HDD drivers to IDE
before the beta release.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Wed May 21 15:30:43 2025

Processes on the same core are concurrent - processes on different
cores are parallel.

Only if the cores and/or "hardware threads" do not interfere with one
another?

That’s why I think the distinction is meaningless.

If you're talking about a set of processes running concurrently or in
parallel, then indeed the two terms are interchangeable, AFAIK.
If you're talking about research areas, parallelism and concurrency are different.

In the case of concurrency the core question is: Given a set of somewhat independent tasks working on some chunks of data, make sure the computed
result is correct, e.g. design tools like mutexes, memory barriers, transactional memory, static analysis, reasoning principles, etc...
whose core focus is on making sure there's no race conditions, dead
locks, ...

In the case of parallelism, the core question instead is: given
a program/algorithm, restructure (or even completely replace) it so as
to divide it into somewhat independent tasks that can take advantage of multiple CPUs to finish the work faster.

Clearly, the two overlap, but they are nevertheless fairly different.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jseigh@21:1/5 to Lynn Wheeler on Wed May 21 17:32:34 2025

On 5/21/25 13:06, Lynn Wheeler wrote:
...

I get roped into helping with a 370 16-CPU multiprocessor design. It was going fine until somebody tells head of POK (high end 370 processors)
that it could be decades before POK's favorite son operating system (now "MVS") had ("effective") 16-cpu support (POK doesn't ship a 16-CPU
system until after the turn of the century) ... and some of us are
invited to never visit POK again.

I got the impression that MVS's processor managment was really unweildy,
the vary processor online stuff, and wasn't expected to happen more than
once after os boot.

When they put in support for multiple TOD clocks, they had to be kept in
sync. If an out of sync condition was detected, checked when bit 31 of
the clock carried out (about once per second), an external interrupt
would be raised, and you synced them back up by putting the cpu's into
set clock spin loops. In VMXA we would just signal processor them into
the sync logic. In MVS, I think they had to vary the procssor back
offline before they could do something like that. Which meant every
time a processor came online from power off state, its TOD clock was
in an unset state, cause a TOD sync ext exception, all the online
processors would have varied offline, so the sync code could be run, and
then varied online again. Repeat for every processor being brought
online from powered off state. MVS couldn't deal with it so they had
the hardware folk put in a standalone clock chip to set the processor's
clock on power on (allegedly a Timex watch chip).

So without that chip, there would always be a TOD sync exeption at boot
time. There was a Japanese data center that had the hardware console
on the opposite side of the data center instead of side by side with the
system console. So when VMXA issued the operator message to press the
enable TOD set button on the hw console, the operator barely had time
to run over the side to press it. You had 30 secs but vmxa messages
were asynchronous (and probably on the wrong side of that spin loop).
Had to futz with order of issuing that operator message.

Oh, and 370 arch stated that STCK would always be unique and monotonic
but that might not have been really true. With enough clock drift, a
lot could happen in the 1 second between TOD sync checks.

Joe Seigh

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stefan Monnier on Thu May 22 02:45:25 2025

On Wed, 21 May 2025 15:30:43 -0400, Stefan Monnier wrote:

In the case of concurrency the core question is: Given a set of somewhat independent tasks working on some chunks of data, make sure the computed result is correct, e.g. design tools like mutexes, memory barriers, transactional memory, static analysis, reasoning principles, etc...
whose core focus is on making sure there's no race conditions, dead
locks, ...

In the case of parallelism, the core question instead is: given a program/algorithm, restructure (or even completely replace) it so as to divide it into somewhat independent tasks that can take advantage of
multiple CPUs to finish the work faster.

All those issues of race conditions, deadlocks, livelocks etc apply
equally whether the concurrency/parallelism is real (multiple physical
CPUs) or virtual (process preemption on shared CPUs).

The difference is that different bugs in the synchronization/locking show
up in different situations. This is why it is helpful to test your
concurrent code in a variety of situations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to [email protected] on Wed May 21 22:19:50 2025

On Wed, 21 May 2025 15:06:27 -0400, EricP
<[email protected]> wrote:

Anton Ertl wrote:

EricP <[email protected]> writes:

When I installed WinNT 3.1 (beta) in 1992 it came with explicit
instructions that HDD must use write-through caching, which was
enabled/disabled by a jumper pin on the drive.

SCSI or PATA?

- anton

IIRC it needed a SCSI board for the CD-ROM drive (I don't remember why,
I just remember being miffed that I had to buy one) but could use IDE HDD.
As the SCSI HDD were the same drives but with a SCSI interface were
$100 more expensive, I would have used the IDE.

WinNT was developed on MIPS R4000 and ported to 80386/486 later.
Those MIPS systems used all SCSI so maybe they hadn't had time to port
the CD-ROM drivers to IDE but had ported the HDD drivers to IDE
before the beta release.

I think the problem went back farther than that. I recall CD drives
[and printers also] running on PC-DOS/MS-DOS in the late 80s using
some kind of lobotomized SCSI controllers that would /not/ also work
for SCSI HDDs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Thu May 22 02:49:53 2025

On Wed, 21 May 2025 14:08:37 +0000, MitchAlsup1 wrote:

... when running back-to-back on the same core they are concurrent.

Consecutively, surely (on a time-slice basis).

When a judge sentences you to serve different jail terms concurrently,
they are not added together.

When two processes run on the same CPU, their execution times are added together.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Thu May 22 02:48:17 2025

On Wed, 21 May 2025 07:05:38 GMT, Anton Ertl wrote:

That leaves the question open what HPC applications are. It's somewhat cyclic: Those that can make good use of parallel (super-)computers.

What makes a supercomputer? These days, it’s a massively parallel machine. But so is a renderfarm. The difference with the super is, it has a high-
speed interconnect between the processor nodes. That’s what allows it to
cope with problems beyond the merely “embarrassingly parallel”, like a renderfarm.

And that’s what makes it more expensive than the renderfarm.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Torbjorn Lindgren@21:1/5 to [email protected] on Thu May 22 12:12:52 2025

BGB <[email protected]> wrote:

I also remember from some early 90s PCs some CD-ROM drives that used
some sort of non-standard interface. Typically, they would plug into an
ISA card with a cable that was (IIRC) somewhat narrower than a normal
IDE cable (I remember it being around the width of a floppy cable; two
ends with a direct connection, and no twists, unlike a typical floppy
cable which usually has part of the cable twisted).

Not entirely sure what it was. Wasn't an IDE/ATAPI CD-ROM though, do
know this much at least...

Digging more, there was apparently a proprietary 34-pin Sony connector,
which does at least resemble what I remember (and some of the drives do
look like the drives I remember seeing).

The common pre-ATAPI CD-ROM interfaces were SCSI, Panasonic/MKE,
Mitsumi and Sony, early on there was also LSMI/Philips but they
switched to one of the other fairly early. Which of these were most
common isn't easy to tell any longer.

There were cheap dedicated interface cards for all of these but most
people bought them bundled with soundcards which included the correct
interface on it. These cards (both dedicated and on soundcard) were
often PIO-only (instead of DMA) which was fine for a 1/2/4x CD-ROM but
not much else.

Later soundcards often had all three "main" interfaces to reduce the
number of SKUs they need and make it easier for customers, later
soundcards switched to providing a (usually gimped) ATA interface -
which then was removed a few years later as ATA interface was
everywhere already.

Panasonic & Mitsumi uses unkeyed 40-pin cables (unkeyed IDE cables can
be used) while Sony went with the 34-pin floppy cable. In all cases
they were not compatible with actual IDE/ATAPA or floppy cards despite
the cable fitting... Yes, there were multiple incompatible non-IDE/ATA
CD-ROM with 40-pin connectors because OF COURSE...

The soundcard ATAPI headers that replaced them were often usable with
disks with the right drivers but performance was usually not very
good.

*: Had once encountered a computer (that at the time was being
discarded, but I got it and had it for a while). It had SCSI drives, and
also the weirdness that rather than the CPU and RAM being on the
motherboard, it was on a riser card (the MOBO was IIRC effectively just
card slots, IIRC they resembled 16-bit ISA slots buts with a long
extended part on the front; similar to the "VESA Local Bus" IIRC).

IIRC, no connectors on the MOBO (apart from power IIRC), all of the
external connections (like VGA and mouse/keyboard) being on riser cards.

I remember the CPU riser card apparently having dual 486DX's, but from
what information I can gather, multi-socket systems weren't a thing with
the 486, so don't know what was going on there. Beyond this, was SIMM
RAM, and a lot of other stuff you would normally see on the MOBO
(excluding connectors).

Could be a Compaq SystemPro[1]? Not sure if there was any other 486
multi-CPU machines that were DOS/Windows (single-CPU) and Windows NT
compatible (IE ruling out Sequent Symmetry).

The daughtercard method isn't restricted to these, IBM used it a lot
on their high-end PS/2 and I've seen it used by someone on pretty much
every generation after, back in the days it wasn't uncommon to run out
of space on the main motherboard.

It was a weird machine, I had managed to get Win NT4 installed on it,
but I remember I couldn't do much else with it at the time. Also for
some unknown reason, while normal 16-bit ISA cards would plug into the
slots, they did not seem to be recognized by the OS.

If it was an original SystemPro they appear to be A-SMP (due to being
designed around the '386) so not everything can run on the "second"
CPU, perhaps that what you remember?

The SystemPro XL appears to be Compaq's first full ("real") SMP 486
machine, this too appears to have leaned into using daughtercards.
This should behave more like a "normal" multi-socket machine.

Both may well be "first" - Compaq was really leading the field at this
point.

1. https://en.wikipedia.org/wiki/Compaq_SystemPro

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Cross@21:1/5 to [email protected] on Thu May 22 11:32:24 2025

In article <rvpXP.453024$[email protected]>,
EricP <[email protected]> wrote:

WinNT was developed on MIPS R4000 and ported to 80386/486 later.

Bringup was done on an i860-based board. That never panned out
in the market, though.

- Dan C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu May 22 17:34:08 2025

On Thu, 22 May 2025 2:49:53 +0000, Lawrence D'Oliveiro wrote:

On Wed, 21 May 2025 14:08:37 +0000, MitchAlsup1 wrote:

... when running back-to-back on the same core they are concurrent.

Consecutively, surely (on a time-slice basis).

When a judge sentences you to serve different jail terms concurrently,
they are not added together.

When two processes run on the same CPU, their execution times are added together.

Wall clock time adds.
process times do not.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to [email protected] on Thu May 22 17:42:14 2025

[email protected] (MitchAlsup1) writes:

On Thu, 22 May 2025 2:49:53 +0000, Lawrence D'Oliveiro wrote:

On Wed, 21 May 2025 14:08:37 +0000, MitchAlsup1 wrote:

... when running back-to-back on the same core they are concurrent.

Consecutively, surely (on a time-slice basis).

When a judge sentences you to serve different jail terms concurrently,
they are not added together.

When two processes run on the same CPU, their execution times are added
together.

Wall clock time adds.
process times do not.

Angels.
Pins.

Given that the two processes are effectively interleaved, the two problems solved by the individual processes are solved concurrently.

If one ran to completion before the other started, then they'd be considered consecutive.

Hyperthreading allows instruction level concurrency between independent processes assigned to the same core.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Thu May 22 22:41:45 2025

On Thu, 22 May 2025 12:39:00 -0500, BGB wrote:

But, yeah, this was apparently something that my mom grabbed from a
dumpster sometime around 20-25 years ago, and its fate was that parents
later returned it to a dumpster.

Don’t you have regulations, or at least discouragements, against e-waste going to landfill?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Thu May 22 22:42:51 2025

On Thu, 22 May 2025 17:34:08 +0000, MitchAlsup1 wrote:

When two processes run on the same CPU, their execution times are added
together.

Wall clock time adds. process times do not.

While the process is current, execution time accumulates according to wall-clock time. Isn’t that the definition of execution time?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Stephen Fuld on Fri May 23 00:18:53 2025

Stephen Fuld <[email protected]d> wrote:

On 5/13/2025 1:12 AM, Lawrence D'Oliveiro wrote:

On Tue, 13 May 2025 07:40:35 GMT, Anton Ertl wrote:

In this case the drive knows things that the OS does not: Consider that
the OS asked for sector N what happens when the arm finally has settled
enough to read from the track, and the first sector it sees is sector
N+10.

The OS isn’t likely to ask for one sector at a time.

Frequently true, so consider this related scenario. The host requests a
read of 10 sectors starting at sector N. When the head settles, the
next sector is N+6. Without any in drive buffering, it would wait
almost a full revolution till record N comes under the head.

With buffering, but no cache, the drive reads record N+5 to N+9 into the buffer, then waits until the drive rotates to record N and begins the
host transfer. This is an improvement because the transfer to the host
is faster than the transfer from the disk, and the last 3 sectors can be transferred out of the buffer without waiting for the disk, so the
transfer is completed faster.

Now consider with caching. Similar, but after record N+9, the drive continues reading into the cache. Lets say there are 30 records on this track. If it reads all of the data into the cache, then proceeds as
above once the disk rotates to record N, it has cost zero time, and if
the host then issues another 10 sector read sequential to the initial
one (or actually any sectors from N+10 to N+29). This can be satisfied
out of the cache without any drive delay, so much faster than without
the cache, and the heads can be moved away to start satisfying another unrelated request. There is minimal cost and substantial benefit.

Now you have argued that the file system cache should take care of that, presumably issuing prefetch reads for the next sectors. This will work,
of course, but has some disadvantages relative to using the drive cache.
Specifically,since it is unlikely the prefetch request will be
received by the drive before record N+10 has passed the heads, it will
incur additional most of a rotational delay, which will tie up the
drive, preventing it from responding to some other request.

No one is arguing that host based file caches are bad. It is simply the
fact that there are situations where drive caches are a useful addition,
and since the drive has to have some DRAM anyway for other reasons, the
cost is minimal. You can think of the drive cache as the "next level"
cache behind the host based cache.

It is pretty clear that due to drive mechanics track cache/buffer
is useful. However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Scott Lurndal on Fri May 23 01:54:28 2025

On Thu, 22 May 2025 17:42:14 +0000, Scott Lurndal wrote:

[email protected] (MitchAlsup1) writes:

On Thu, 22 May 2025 2:49:53 +0000, Lawrence D'Oliveiro wrote:

On Wed, 21 May 2025 14:08:37 +0000, MitchAlsup1 wrote:

... when running back-to-back on the same core they are concurrent.

Consecutively, surely (on a time-slice basis).

When a judge sentences you to serve different jail terms concurrently,
they are not added together.

When two processes run on the same CPU, their execution times are added
together.

Wall clock time adds.
process times do not.

Angels.
Pins.

Given that the two processes are effectively interleaved, the two
problems
solved by the individual processes are solved concurrently.

If one ran to completion before the other started, then they'd be
considered
consecutive.

Hyperthreading allows instruction level concurrency between independent processes assigned to the same core.

Same physical core, different virtual core.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to [email protected] on Thu May 22 23:47:14 2025

On Thu, 22 May 2025 02:49:53 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:

On Wed, 21 May 2025 14:08:37 +0000, MitchAlsup1 wrote:

... when running back-to-back on the same core they are concurrent.

Consecutively, surely (on a time-slice basis).

"consecutively" implies serially and running to completion.

Not sure of a better term ... maybe "adjacently"?

When a judge sentences you to serve different jail terms concurrently,
they are not added together.

Different domain - terminology not relevant.

When two processes run on the same CPU, their execution times are added >together.

Depends on the context: if you want to know how long until both are
complete, then yes. If you want to know when the 1st completes - or alternatively when the 2nd one begins, then the answer depends on
whether they are run serially (consecutively ;-) or concurrently
(interleaved).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Waldek Hebisch on Fri May 23 05:35:12 2025

On Fri, 23 May 2025 00:18:53 -0000 (UTC), Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer is
useful.

Only if you don’t take the statistics of real-world cache behaviour into account.

However, the real question is about size: how big should it be.

It can never be big enough to make a difference.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to BGB on Fri May 23 12:36:15 2025

On Fri, 23 May 2025 6:09:37 +0000, BGB wrote:

On 5/23/2025 12:35 AM, Lawrence D'Oliveiro wrote:

On Fri, 23 May 2025 00:18:53 -0000 (UTC), Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer is
useful.

Only if you don’t take the statistics of real-world cache behaviour into >> account.

However, the real question is about size: how big should it be.

It can never be big enough to make a difference.

Sometimes, it is not the size of the cache that matters.

Say, for example, a typical cache configuration in my core:
L1 D$: 32K, direct mapped
L1 I$: 16K, direct mapped
L2: 256K, direct mapped.

OK, so say I stick a 4K cache between the L1 and L2 caches.
Seems kinda useless just based on sizes.

It is called a "victim buffer"

Except: This small cache is 4-way set associative and so can absorb a
bunch of conflict misses, and notably reducing the number of cache
misses in the L2 cache. It can bring a performance benefit, despite
being small.

And generally fully associative

Why?

Because it helps.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to BGB on Fri May 23 13:20:31 2025

BGB <[email protected]> writes:

On 5/22/2025 5:41 PM, Lawrence D'Oliveiro wrote:

On Thu, 22 May 2025 12:39:00 -0500, BGB wrote:

But, yeah, this was apparently something that my mom grabbed from a
dumpster sometime around 20-25 years ago, and its fate was that parents
later returned it to a dumpster.

Don’t you have regulations, or at least discouragements, against e-waste >> going to landfill?

Yes, in the last two decades. A quarter century ago, not so much.

This is 'Murica, pretty much everything goes in the trash here...

Nonsense.

Old computers, old electronics, batteries, ...
Paper, plastic, used motor oil, ...

It all goes in the trash...

Nonsense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Scott Lurndal on Fri May 23 14:18:24 2025

[email protected] (Scott Lurndal) writes:

BGB <[email protected]> writes:

On 5/22/2025 5:41 PM, Lawrence D'Oliveiro wrote:

On Thu, 22 May 2025 12:39:00 -0500, BGB wrote:

But, yeah, this was apparently something that my mom grabbed from a
dumpster sometime around 20-25 years ago, and its fate was that parents >>>> later returned it to a dumpster.

Don’t you have regulations, or at least discouragements, against e-waste >>> going to landfill?

Yes, in the last two decades. A quarter century ago, not so much.

This is 'Murica, pretty much everything goes in the trash here...

Nonsense.

Old computers, old electronics, batteries, ...
Paper, plastic, used motor oil, ...

It all goes in the trash...

Nonsense.

To elaborate. Batteries are collected by retailers (drop 'em off at Ace Hardware).
The solid waste company that picks up the trash provides 5gal jugs
for used motor oil and bags for used filters (which can also be dropped
off at any auto parts store). Paper, plastic, metals (other than styrofoam) are collected and recycled. Old electronics are recycled. Office supply stores accept used printer ink cartridges. Best buy accepts electronics
for recycling as do various recyclers that offer drop-off locations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Waldek Hebisch on Fri May 23 08:28:44 2025

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

Stephen Fuld <[email protected]d> wrote:

On 5/13/2025 1:12 AM, Lawrence D'Oliveiro wrote:

On Tue, 13 May 2025 07:40:35 GMT, Anton Ertl wrote:

In this case the drive knows things that the OS does not: Consider that >>>> the OS asked for sector N what happens when the arm finally has settled >>>> enough to read from the track, and the first sector it sees is sector
N+10.

The OS isn’t likely to ask for one sector at a time.

Frequently true, so consider this related scenario. The host requests a
read of 10 sectors starting at sector N. When the head settles, the
next sector is N+6. Without any in drive buffering, it would wait
almost a full revolution till record N comes under the head.

With buffering, but no cache, the drive reads record N+5 to N+9 into the
buffer, then waits until the drive rotates to record N and begins the
host transfer. This is an improvement because the transfer to the host
is faster than the transfer from the disk, and the last 3 sectors can be
transferred out of the buffer without waiting for the disk, so the
transfer is completed faster.

Now consider with caching. Similar, but after record N+9, the drive
continues reading into the cache. Lets say there are 30 records on this
track. If it reads all of the data into the cache, then proceeds as
above once the disk rotates to record N, it has cost zero time, and if
the host then issues another 10 sector read sequential to the initial
one (or actually any sectors from N+10 to N+29). This can be satisfied
out of the cache without any drive delay, so much faster than without
the cache, and the heads can be moved away to start satisfying another
unrelated request. There is minimal cost and substantial benefit.

Now you have argued that the file system cache should take care of that,
presumably issuing prefetch reads for the next sectors. This will work,
of course, but has some disadvantages relative to using the drive cache.
Specifically,since it is unlikely the prefetch request will be
received by the drive before record N+10 has passed the heads, it will
incur additional most of a rotational delay, which will tie up the
drive, preventing it from responding to some other request.

No one is arguing that host based file caches are bad. It is simply the
fact that there are situations where drive caches are a useful addition,
and since the drive has to have some DRAM anyway for other reasons, the
cost is minimal. You can think of the drive cache as the "next level"
cache behind the host based cache.

It is pretty clear that due to drive mechanics track cache/buffer
is useful.

Pretty clear to everyone except one person. :-)

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer "operations" than the 80-150 you get when just including rotations.

And if you are caching writes, more cache gives you more blocks to
choose from when optimizing the write back order, which reduces the time
to write them all back. The larger DRAM is a small component of drive
cost, so the manufacturers think it is worth including more.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jseigh@21:1/5 to Scott Lurndal on Fri May 23 12:34:43 2025

On 5/23/25 10:18, Scott Lurndal wrote:

To elaborate. Batteries are collected by retailers (drop 'em off at Ace Hardware).
The solid waste company that picks up the trash provides 5gal jugs
for used motor oil and bags for used filters (which can also be dropped
off at any auto parts store). Paper, plastic, metals (other than styrofoam) are collected and recycled. Old electronics are recycled. Office supply stores accept used printer ink cartridges. Best buy accepts electronics
for recycling as do various recyclers that offer drop-off locations.

Here, Staples will take most stuff for recycling though they charge for computer monitors.

Apple takes Apple stuff. I recyled an old PPC mini, and a 2010
mini that I had installed ChromeOS Flex on though it was too slow.
Way to hard to replace the hdd with an ssd on the 2010 mini.

ChomeOS Flex is a nice way to keep old hardware running if you
can put in an ssd. I installed on a old Lenovo laptop that even
Linux complained it was going to stop supporting the cpu. So
now it's a chromebook, a really heavy chromebook.

I had a chromebox and a chromebook that had gone out of update support,
the old 5 year support. You have to replace the firmware on those which
is a one shot try deal (unless you have an eeprom programmer). I
bricked the chromebox (I think I know why) but the chromebook worked.
Sort of weird since the OEMs just weasel out of supporting stuff that
still works. They should be forced to take back stuff that still
works and that they won't provide firmware updates for.

You can also upgrade those windows 10 boxes that MS says can't be
upgraded to windows 11. I upgraded a really old box that didn't even
have TPM modules. There's articles online on how to do that.

Joe Seigh

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Fri May 23 17:03:02 2025

Stephen Fuld [2025-05-23 08:28:44] wrote:

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer
is useful.

Pretty clear to everyone except one person. :-)

🙂

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer "operations" than the 80-150 you get when just including rotations.

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't
change that fact.

And if you are caching writes, more cache gives you more blocks to choose from when optimizing the write back order, which reduces the time to write them all back.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so
unless the drive is allowed to do write-back caching it seems the amount
of space used for write-buffering is likely small (compared to 256MB).
[ Unless it is common for individual write commands to cover multi-MB
chunks of data? ]

The larger DRAM is a small component of drive cost, so the
manufacturers think it is worth including more.

In some markets (e.g. home routers), the size of DRAM seems to be enough
of a cost factor that it took many years until reaching 256MBs, even
though those boxes *need* that RAM for all kinds of purposes (the 128MB
of my current home-router seems to be its main source of instability).
but HDDs are pretty damn expensive beasts nowadays (because prices have
not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Fri May 23 22:19:45 2025

On Fri, 23 May 2025 08:28:44 -0700, Stephen Fuld wrote:

And if you are caching writes, more cache gives you more blocks to
choose from when optimizing the write back order ...

Fun fact: It seems to be part of the spec that write cache reordering is disabled if the driver is using NCQ.

Assuming drives pay attention to that part of the spec, of course.

See the description of the “wcreorder” setting here <https://manpages.debian.org/smartctl(8)>.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Lawrence D'Oliveiro on Fri May 23 20:51:29 2025

Lawrence D'Oliveiro wrote:

On Fri, 23 May 2025 08:28:44 -0700, Stephen Fuld wrote:

And if you are caching writes, more cache gives you more blocks to
choose from when optimizing the write back order ...

Fun fact: It seems to be part of the spec that write cache reordering is disabled if the driver is using NCQ.

Assuming drives pay attention to that part of the spec, of course.

See the description of the “wcreorder” setting here <https://manpages.debian.org/smartctl(8)>.

Referring to the wcreorder flag, it doesn't say it disables reordering.
It says "The state of Write Cache Reordering has no effect on either
NCQ or LCQ queued commands.". SATA has separate commands for queued
reads and writes and non-queued. It sounds like that flag controls
whether non-queued commands are also reordered or not.
By using a queue command you are giving permission to reorder that command.

Note that most HDD drivers also do reordering (elevator algorithm).
I don't know if they disable it for SATA drives with NCQ but I see
no reason to as NCQ only handles up to 32 commands whereas driver
reorder optimization applies to the whole disk queue.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Sat May 24 03:17:00 2025

On Fri, 23 May 2025 11:29:03 -0500, BGB wrote:

On 5/23/2025 7:36 AM, MitchAlsup1 wrote:

On Fri, 23 May 2025 6:09:37 +0000, BGB wrote:

OK, so say I stick a 4K cache between the L1 and L2 caches.
Seems kinda useless just based on sizes.

It is called a "victim buffer"

I had usually understood it that a "victim buffer" was typically glued directly to the L1 cache.

So let me understand this: the analogous concept to a “victim buffer” in the disk drive case (assuming the analogy makes sense at all), would be something “glued directly” to the OS filesystem cache? That is, it would
be on the main RAM side, not on the drive side? In order for the analogy
to be truly analogous?

Just trying to get things clear here. Because, you know, some people like
to muddy the waters to distract from the fact that they don’t actually understand what’s going on.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Stefan Monnier on Sat May 24 09:23:48 2025

On 5/23/2025 2:03 PM, Stefan Monnier wrote:

Stephen Fuld [2025-05-23 08:28:44] wrote:

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer
is useful.

Pretty clear to everyone except one person. :-)

🙂

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer
"operations" than the 80-150 you get when just including rotations.

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't
change that fact.

Yes, I didn't express myself well. :-( And once again, I have to say
that my information may be obsolete.

I think it is useful to separate talking about read data from write
data. For read data, as with any cache, more is always better than
less, though with diminishing returns. Why pick 1.25 sec as the "cut
off point"? If the host re-references data that it hasn't read for say
3 seconds, having it in cache still saves, probably a seek time and on
average 1/2 rotation time. Plus, it means the heads will be free to
handle other requests. All of this is standard cache benefits. I see
no reason to limit the cache size and reduce this benefit.

And if you are caching writes, more cache gives you more blocks to choose
from when optimizing the write back order, which reduces the time to write >> them all back.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so unless the drive is allowed to do write-back caching it seems the amount
of space used for write-buffering is likely small (compared to 256MB).
[ Unless it is common for individual write commands to cover multi-MB
chunks of data? ]

For write data, I was unaware of the 32 operation limit. I was used to
SCSI, which, IIRC was larger, and for server type applications, where
some sort of UPS is more common, the site may choose to enable write
caching in the disk. For a disk vendor, given the small cost of the
DRAM, it is an easy choice.

The larger DRAM is a small component of drive cost, so the
manufacturers think it is worth including more.

In some markets (e.g. home routers), the size of DRAM seems to be enough
of a cost factor that it took many years until reaching 256MBs, even
though those boxes *need* that RAM for all kinds of purposes (the 128MB
of my current home-router seems to be its main source of instability).
but HDDs are pretty damn expensive beasts nowadays (because prices have
not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

I can't comment on routers, but for disks, while the cost of the disk
may not have come down, increasing capacity allows reduced cost per
gigabyte. A substantial portion of the cost is not subject to Moore's
law (e.g. drive motor, magnets and arm assembly, etc.) and some capacity increasing technologies cost more (but not enough more to overwhelm the capacity advantage).

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Stephen Fuld on Sun May 25 18:05:18 2025

Stephen Fuld <[email protected]d> wrote:

On 5/23/2025 2:03 PM, Stefan Monnier wrote:

Stephen Fuld [2025-05-23 08:28:44] wrote:

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer
is useful.

Pretty clear to everyone except one person. :-)

🙂

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer >>> "operations" than the 80-150 you get when just including rotations.

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't
change that fact.

Yes, I didn't express myself well. :-( And once again, I have to say
that my information may be obsolete.

I think it is useful to separate talking about read data from write
data. For read data, as with any cache, more is always better than
less, though with diminishing returns. Why pick 1.25 sec as the "cut
off point"? If the host re-references data that it hasn't read for say
3 seconds, having it in cache still saves, probably a seek time and on average 1/2 rotation time. Plus, it means the heads will be free to
handle other requests. All of this is standard cache benefits. I see
no reason to limit the cache size and reduce this benefit.

We are talking here about common case, that is when disc is accessed
via OS cache. OS cache is significantly larger than disc cache, so
hit ratio for data sent to host is going to be quite low. Disc
cache has an advantage: it gets "for free" some data that host did
not request. But it is rather unlikely that keeping such data
for long time has significant advantage.

And if you are caching writes, more cache gives you more blocks to choose >>> from when optimizing the write back order, which reduces the time to write >>> them all back.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so
unless the drive is allowed to do write-back caching it seems the amount
of space used for write-buffering is likely small (compared to 256MB).
[ Unless it is common for individual write commands to cover multi-MB
chunks of data? ]

For write data, I was unaware of the 32 operation limit. I was used to
SCSI, which, IIRC was larger, and for server type applications, where
some sort of UPS is more common, the site may choose to enable write
caching in the disk. For a disk vendor, given the small cost of the
DRAM, it is an easy choice.

I do not look at details of disc protocol. But with protocal done
right host would first transfer commands and then deliver data
in order requested by the drive. So most buffering would be in
the host and disc would need just enough buffering to ensure
smooth transmission and low interrupt rate. 4 track looks like
plenty for this purpose.

The larger DRAM is a small component of drive cost, so the
manufacturers think it is worth including more.

In some markets (e.g. home routers), the size of DRAM seems to be enough
of a cost factor that it took many years until reaching 256MBs, even
though those boxes *need* that RAM for all kinds of purposes (the 128MB
of my current home-router seems to be its main source of instability).
but HDDs are pretty damn expensive beasts nowadays (because prices have
not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

I can't comment on routers, but for disks, while the cost of the disk
may not have come down, increasing capacity allows reduced cost per
gigabyte. A substantial portion of the cost is not subject to Moore's
law (e.g. drive motor, magnets and arm assembly, etc.) and some capacity increasing technologies cost more (but not enough more to overwhelm the capacity advantage).

In nineties I read that for motherboard manufactures 1 cent was
"negligible", but 10 cents was significant: In volume transactions
margins were low and no party were willing to absorb 10 cents
per piece "loss". Discs probably are less competitive than
motherboards, but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

So IMO it is highly unclear why manufacturers use large caches.
One possible explanation could be benchmarketing and using
obsolete benchmarks. Another could be inertia with customers
thinking that "larger cache is better".

Another things is fragmenting market into different "kinds" of
drives. Rationally, high performance drives should get
better mechanical parts. But in given performance area there
seem to be no reason for different mechanics, so I suspect
that they use the same. They may get different firmware.
"Green" consumer parts seem to be quite aggressive powering
down (IIUC on recent WD parts it is impossible to permanently
disable this), but beyond this it is not clear to me if there
are rational reasons for significantly different firmware.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Waldek Hebisch on Sun May 25 12:13:08 2025

On 5/25/2025 11:05 AM, Waldek Hebisch wrote:

Stephen Fuld <[email protected]d> wrote:

On 5/23/2025 2:03 PM, Stefan Monnier wrote:

Stephen Fuld [2025-05-23 08:28:44] wrote:

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer
is useful.

Pretty clear to everyone except one person. :-)

🙂

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer >>>> "operations" than the 80-150 you get when just including rotations.

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't
change that fact.

Yes, I didn't express myself well. :-( And once again, I have to say
that my information may be obsolete.

I think it is useful to separate talking about read data from write
data. For read data, as with any cache, more is always better than
less, though with diminishing returns. Why pick 1.25 sec as the "cut
off point"? If the host re-references data that it hasn't read for say
3 seconds, having it in cache still saves, probably a seek time and on
average 1/2 rotation time. Plus, it means the heads will be free to
handle other requests. All of this is standard cache benefits. I see
no reason to limit the cache size and reduce this benefit.

We are talking here about common case, that is when disc is accessed
via OS cache. OS cache is significantly larger than disc cache, so
hit ratio for data sent to host is going to be quite low. Disc
cache has an advantage: it gets "for free" some data that host did
not request. But it is rather unlikely that keeping such data
for long time has significant advantage.

And if you are caching writes, more cache gives you more blocks to choose >>>> from when optimizing the write back order, which reduces the time to write >>>> them all back.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so >>> unless the drive is allowed to do write-back caching it seems the amount >>> of space used for write-buffering is likely small (compared to 256MB).
[ Unless it is common for individual write commands to cover multi-MB
chunks of data? ]

For write data, I was unaware of the 32 operation limit. I was used to
SCSI, which, IIRC was larger, and for server type applications, where
some sort of UPS is more common, the site may choose to enable write
caching in the disk. For a disk vendor, given the small cost of the
DRAM, it is an easy choice.

I do not look at details of disc protocol. But with protocal done
right host would first transfer commands and then deliver data
in order requested by the drive. So most buffering would be in
the host and disc would need just enough buffering to ensure
smooth transmission and low interrupt rate. 4 track looks like
plenty for this purpose.

No, when the disk receives a write command, it accepts the write data immediately (up to some large limit). That way, when the heads settle
on the track, if the disk happens to be positioned in the middle of the transfer, it can write the last part of the data to the disk
immediately, then wait for the disk to spin to where the transfer starts
to finish the transferring the first part of the write data. This
reduces average latency, i.e. improves performance.

The larger DRAM is a small component of drive cost, so the
manufacturers think it is worth including more.

In some markets (e.g. home routers), the size of DRAM seems to be enough >>> of a cost factor that it took many years until reaching 256MBs, even
though those boxes *need* that RAM for all kinds of purposes (the 128MB
of my current home-router seems to be its main source of instability).
but HDDs are pretty damn expensive beasts nowadays (because prices have
not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

I can't comment on routers, but for disks, while the cost of the disk
may not have come down, increasing capacity allows reduced cost per
gigabyte. A substantial portion of the cost is not subject to Moore's
law (e.g. drive motor, magnets and arm assembly, etc.) and some capacity
increasing technologies cost more (but not enough more to overwhelm the
capacity advantage).

In nineties I read that for motherboard manufactures 1 cent was
"negligible", but 10 cents was significant: In volume transactions
margins were low and no party were willing to absorb 10 cents
per piece "loss". Discs probably are less competitive than
motherboards, but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

I can't comment on your specific numbers, but assuming you are right,
adding $1 to the cost is is small, at least in the part of the market I
was familiar with. And remember, you are not "adding" 256MB, as some of
that is needed for various internal operations.

So IMO it is highly unclear why manufacturers use large caches.
One possible explanation could be benchmarketing and using
obsolete benchmarks.

Perhaps, but disk manufacturers are very sensitive to whatever
benchmarks their customers use.

Another could be inertia with customers
thinking that "larger cache is better".

Sure.

Another things is fragmenting market into different "kinds" of
drives. Rationally, high performance drives should get
better mechanical parts. But in given performance area there
seem to be no reason for different mechanics, so I suspect
that they use the same. They may get different firmware.
"Green" consumer parts seem to be quite aggressive powering
down (IIUC on recent WD parts it is impossible to permanently
disable this), but beyond this it is not clear to me if there
are rational reasons for significantly different firmware.

My knowledge is too obsolete to know about any of these, but another possibility is more extensive burn in for more expensive drives.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to All on Sun May 25 19:36:01 2025

$1 added to a $200 drive is invisible
$1 added to a $20 drive is massive.

{at my current rate of space consumption, 1TB lasts a bit over 8 years}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lars Poulsen@21:1/5 to Stefan Monnier on Sun May 25 20:55:43 2025

On 2025-05-23, Stefan Monnier <[email protected]> wrote:

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't
change that fact.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so unless the drive is allowed to do write-back caching it seems the amount
of space used for write-buffering is likely small (compared to 256MB).
[ Unless it is common for individual write commands to cover multi-MB
chunks of data? ]

As I see it, with variable track length geometries, the OS file
system cannot make reasonable assumptions about track boundaries, so it
cannot maintain track/cylinder caches, but the drive processor can.

...
but HDDs are pretty damn expensive beasts nowadays (because prices have
not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

Actually, HDDs are still on a steep downward price curve. The amount of
HDD space you get for USD 100 keeps going up, up, up. Look at portable
backup drives.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Sun May 25 19:16:14 2025

No, when the disk receives a write command, it accepts the write data immediately (up to some large limit). That way, when the heads settle on
the track, if the disk happens to be positioned in the middle of the transfer, it can write the last part of the data to the disk immediately, then wait for the disk to spin to where the transfer starts to finish the transferring the first part of the write data. This reduces average
latency, i.e. improves performance.

Really? I had the impression that it would be very hard to start writing
from the middle of a sector because of the need to be sure exactly where
in the sector we are. IOW, the drive needs to see the inter-sector markers before it can start writing to a sector.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Stefan Monnier on Sun May 25 16:27:28 2025

On 5/25/2025 4:16 PM, Stefan Monnier wrote:

No, when the disk receives a write command, it accepts the write data
immediately (up to some large limit). That way, when the heads settle on
the track, if the disk happens to be positioned in the middle of the
transfer, it can write the last part of the data to the disk immediately,
then wait for the disk to spin to where the transfer starts to finish the
transferring the first part of the write data. This reduces average
latency, i.e. improves performance.

Really? I had the impression that it would be very hard to start writing from the middle of a sector because of the need to be sure exactly where
in the sector we are. IOW, the drive needs to see the inter-sector markers before it can start writing to a sector.

I am not sure what you mean by sector. If you mean a disk block,
typically used to be 512 bytes, now typically 4K bytes, then you are
right that you can't start writing in the middle of one. But a track
typically has many blocks and you can start writing at any one of them.
So if a write is say 16K bytes and you have 4K blocks on the disk,
then the drive could start at say the second 4K block, complete the last
12K of the transfer, wait for the rotation then finish the first 4K block.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stephen Fuld on Mon May 26 03:01:28 2025

On Sun, 25 May 2025 23:27:28 +0000, Stephen Fuld wrote:

On 5/25/2025 4:16 PM, Stefan Monnier wrote:

No, when the disk receives a write command, it accepts the write data
immediately (up to some large limit). That way, when the heads settle
on
the track, if the disk happens to be positioned in the middle of the
transfer, it can write the last part of the data to the disk
immediately,
then wait for the disk to spin to where the transfer starts to finish
the
transferring the first part of the write data. This reduces average
latency, i.e. improves performance.

Really? I had the impression that it would be very hard to start
writing
from the middle of a sector because of the need to be sure exactly where
in the sector we are. IOW, the drive needs to see the inter-sector
markers
before it can start writing to a sector.

I am not sure what you mean by sector. If you mean a disk block,
typically used to be 512 bytes, now typically 4K bytes, then you are
right that you can't start writing in the middle of one. But a track typically has many blocks and you can start writing at any one of them.

It used to be that the heads were in read-mode looking for the sector
start symbol (preamble), before starting to write a sector.

So if a write is say 16K bytes and you have 4K blocks on the disk,
then the drive could start at say the second 4K block, complete the last
12K of the transfer, wait for the rotation then finish the first 4K
block.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Sun May 25 23:58:04 2025

I am not sure what you mean by sector. If you mean a disk block, typically used to be 512 bytes, now typically 4K bytes, then you are right that you can't start writing in the middle of one.

Yes, that's what I meant.

But a track typically has many blocks and you can start writing at any
one of them. So if a write is say 16K bytes and you have 4K blocks
on the disk, then the drive could start at say the second 4K block,
complete the last 12K of the transfer, wait for the rotation then
finish the first 4K block.

Oh, you were talking about a multi-block write command.
It all makes sense now, thank you,

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Mon May 26 00:07:36 2025

As I see it, with variable track length geometries, the OS file
system cannot make reasonable assumptions about track boundaries, so it cannot maintain track/cylinder caches, but the drive processor can.

IIUC, the 2D aspect of track/cylinder info is not sufficiently important
at that level, so the 1D approximation of it you get with current LBA addressing gets you most of the benefit, and a small amount of buffering
(with a couple of tracks' storage) is sufficient to smooth out
the difference.

...
but HDDs are pretty damn expensive beasts nowadays (because prices have
not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

Actually, HDDs are still on a steep downward price curve.

Downward maybe, but steep?

AFAICT, a 2�" 2TB drive still costs me about CAD$115.00 (and the largest
one I can find is only 5GB) whereas I bought one in (early) 2013 for CAD$176.29.

Admittedly, the 2013 one was 15mm thick, but it still says something
about the price curve, I think.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Mon May 26 07:13:13 2025

On Sun, 25 May 2025 12:13:08 -0700, Stephen Fuld wrote:

No, when the disk receives a write command, it accepts the write data immediately (up to some large limit). That way, when the heads settle
on the track, if the disk happens to be positioned in the middle of the transfer, it can write the last part of the data to the disk
immediately, then wait for the disk to spin to where the transfer starts
to finish the transferring the first part of the write data. This
reduces average latency, i.e. improves performance.

But it requires reordering of writes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Waldek Hebisch on Mon May 26 06:46:01 2025

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at least in
the past. The fact that there are only 2-3 surviving HDD
manufacturers indicates intense competition in the past, possible less
today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold
512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB DDR3
DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3
DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is unlikely
to save even one cent.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Lars Poulsen on Mon May 26 07:13:01 2025

Lars Poulsen <[email protected]> writes:

Actually, HDDs are still on a steep downward price curve. The amount of
HDD space you get for USD 100 keeps going up, up, up.

Actually, we have had a plateau for almost a decade where the density
hardly grew: In <[email protected]> I wrote:

|Robert Wessel <[email protected]> writes:
[...]

Short term variations of growth are hard deal with in any sort of |>projection, but if you look at the time between 1TB and 10TB disks,
January 2007 through December 2015, or 106 months, you get a doubling
once every 31.9 months (that wasn't actually the source for my 30 year |>estimate, but it happens to work out about the same way).

|
|Looking closer at that development, we see a growth to 3TB until
|August 2010 (23.3 months per doubling), from 3TB to 4TB until
|September 2011 (31.3 months per doubling), and 38.6 months per
|doubling from 4TB to 10TB.
|
|And if we look at the development since then, we got a 14TB SMR drive
|in October 2017 (45.3 months per doubling from 10TB to 14TB); this
|drive is no longer being sold; all drive manufacturers now sell PMR
|drives. No further growth has happened in the last year.

Growth until about a year ago or so was even slower:

size since
16TB 2019
18TB 2020
20TB 2021
22TB 2022
24TB 2023
26TB 2024

(Method: looking at the "gelistet seit" entries on <https://geizhals.eu/?cat=hde7s>).

And consequently, the price per GB has not gone down much. E.g.,
looking at the cheapest 16TB drive today, it's price started with an early-adopter price of EUR 529 in 2019-07-11, went down to EUR 310 by 2021-03-07, and in the last 4 years only went down to EUR 240 (with a
low point of EUR 213 last summer) <https://geizhals.eu/?phist=2068349&age=9999>. So not a particularly
steep slope in the last 4 years.

Supposedly the drive manufacturers have now finally managed to make
HMR technology fit for the mass-market, and they promise much faster
growth during the next decade; and that would consequently lead to
getting a faster growth in the amount of HDD space one gets for the
money. We will see if that really works out.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Anton Ertl on Mon May 26 16:06:58 2025

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at least in
the past. The fact that there are only 2-3 surviving HDD
manufacturers indicates intense competition in the past, possible less
today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold
512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB DDR3
DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3
DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is unlikely
to save even one cent.

- anton

You are projecting computer memory prices on very different market.
The memory used for HD cache is likely an individual memory chip or two
chips and likely several generation older than devices used in computer
DIMMs. I would expect something like x16 DDR3. Looking for price of
such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less than what
we see on Mauser, there exists measurable difference between 128, 256
and 512 MB. The difference is smaller than suggested by Waldek but much
bigger than suggested by yourself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Mon May 26 07:31:05 2025

On 5/26/2025 12:13 AM, Lawrence D'Oliveiro wrote:

On Sun, 25 May 2025 12:13:08 -0700, Stephen Fuld wrote:

No, when the disk receives a write command, it accepts the write data
immediately (up to some large limit). That way, when the heads settle
on the track, if the disk happens to be positioned in the middle of the
transfer, it can write the last part of the data to the disk
immediately, then wait for the disk to spin to where the transfer starts
to finish the transferring the first part of the write data. This
reduces average latency, i.e. improves performance.

But it requires reordering of writes.

No, it doesn't. In my earlier post, I showed how with just using the
DRAM for buffering, not caching, it is still advantageous to take the
write data with the command, as it may allow you to complete the write
faster if, when the heads come on cylinder, it happens to be in the
middle of the transfer.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Michael S on Mon May 26 16:30:20 2025

Michael S <[email protected]> writes:

You are projecting computer memory prices on very different market.
The memory used for HD cache is likely an individual memory chip or two
chips and likely several generation older than devices used in computer >DIMMs. I would expect something like x16 DDR3. Looking for price of
such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less than what
we see on Mauser, there exists measurable difference between 128, 256
and 512 MB.

My experience with Mauser prices suggest a much higher factor (but I
am too lazy to look up Chinese dealers on Aliexpress), and their
prices do not necessarily reflect a constant factor markup above their
buying price.

The business model of PC-part dealers has less markup than Mauser,
therefore I think the DIMM prices are a better indicator of what DRAM
chips cost.

Concerning x16 chips vs. x8 chips, I don't expect them to have much
difference in cost, and in a highly competetive market like DRAM,
consequently not much difference in price. And in any case, the
difference will be a constant cost offset for different DRAM sizes.

Ok, looking at DDR3 DIMM prices and discounting dealers that are far
cheaper than the meanstream (those dealers are probably just selling
off DIMMs that they got for cheap from OEM surplus or somesuch), I see
the following prices <https://geizhals.eu/?cat=ramddr3&xf=1454_1024%7E1454_2048%7E1454_4096%7E256_1x%7E5828_DDR3>

DIMM chip EUR
2GB 8x2Gb 10 https://geizhals.eu/v7-dimm-2gb-v7128002gbd-a1528958.html
4GB 8x4Gb 12 https://geizhals.eu/patriot-signature-line-so-dimm-4gb-psd34g16002s-a624676.html

Only 2 1GB DIMMs are offered, by one dealer each, the cheapest at EUR
17. So it apparently no longer pays off to produce them; the cost
advantage of the smaller chips is too small.

Looking at the price difference between 2GB and 4GB DIMMs, this would
indicate a difference of EUR 0.25 per chip. But of course that
includes the distribution cost and VAT, so the difference for the
manufacturer may be in the area of EUR 0.1.

While I am at it: On <https://geizhals.eu/?cat=hde7s> I see for HDDs:

cache #drives #drives
#drives size >=20TB 10TB-18TB
1585 >=2MB
1579 >=8MB
1397 >=16MB
1248 >=32MB
1150 >=64MB
863 >=128MB 247
599 >=256MB 72 246
172 >=512MB 65 83

So they spend the extra buffer on the newest and most expensive
drives, while older drives often make do with less buffering. My
guess is that this reflects the prices of the DRAM chips at design
time, and there may be some thought given to the target price of the
HDD and the availability of the DRAM chips while the drive will be manufactured.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Stephen Fuld on Mon May 26 19:20:58 2025

Stephen Fuld <[email protected]d> wrote:

On 5/25/2025 11:05 AM, Waldek Hebisch wrote:

Stephen Fuld <[email protected]d> wrote:

On 5/23/2025 2:03 PM, Stefan Monnier wrote:

Stephen Fuld [2025-05-23 08:28:44] wrote:

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer
is useful.

Pretty clear to everyone except one person. :-)

🙂

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer >>>>> "operations" than the 80-150 you get when just including rotations.

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't
change that fact.

Yes, I didn't express myself well. :-( And once again, I have to say
that my information may be obsolete.

I think it is useful to separate talking about read data from write
data. For read data, as with any cache, more is always better than
less, though with diminishing returns. Why pick 1.25 sec as the "cut
off point"? If the host re-references data that it hasn't read for say
3 seconds, having it in cache still saves, probably a seek time and on
average 1/2 rotation time. Plus, it means the heads will be free to
handle other requests. All of this is standard cache benefits. I see
no reason to limit the cache size and reduce this benefit.

We are talking here about common case, that is when disc is accessed
via OS cache. OS cache is significantly larger than disc cache, so
hit ratio for data sent to host is going to be quite low. Disc
cache has an advantage: it gets "for free" some data that host did
not request. But it is rather unlikely that keeping such data
for long time has significant advantage.

And if you are caching writes, more cache gives you more blocks to choose >>>>> from when optimizing the write back order, which reduces the time to write
them all back.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so >>>> unless the drive is allowed to do write-back caching it seems the amount >>>> of space used for write-buffering is likely small (compared to 256MB). >>>> [ Unless it is common for individual write commands to cover multi-MB
chunks of data? ]

For write data, I was unaware of the 32 operation limit. I was used to
SCSI, which, IIRC was larger, and for server type applications, where
some sort of UPS is more common, the site may choose to enable write
caching in the disk. For a disk vendor, given the small cost of the
DRAM, it is an easy choice.

I do not look at details of disc protocol. But with protocal done
right host would first transfer commands and then deliver data
in order requested by the drive. So most buffering would be in
the host and disc would need just enough buffering to ensure
smooth transmission and low interrupt rate. 4 track looks like
plenty for this purpose.

No, when the disk receives a write command, it accepts the write data immediately (up to some large limit). That way, when the heads settle
on the track, if the disk happens to be positioned in the middle of the transfer, it can write the last part of the data to the disk
immediately, then wait for the disk to spin to where the transfer starts
to finish the transferring the first part of the write data. This
reduces average latency, i.e. improves performance.

Latency of writes typically is of little importance, host buffers
several seconds of of write data and writes only after delay
(ratinale for this is that data may be overwritten, by dealying
host may avoid actual disc operation). To allow scheduling in
the drive one wants commands to be sent as fast as possible.
Sending possibly bulky write data before sending next command
looks counterpoductive. There could be some data that host
wants to be in persistent storage as fast as possible, but
making it the only option clearly would be a design error in
the disc protocol.

The larger DRAM is a small component of drive cost, so the
manufacturers think it is worth including more.

In some markets (e.g. home routers), the size of DRAM seems to be enough >>>> of a cost factor that it took many years until reaching 256MBs, even
though those boxes *need* that RAM for all kinds of purposes (the 128MB >>>> of my current home-router seems to be its main source of instability). >>>> but HDDs are pretty damn expensive beasts nowadays (because prices have >>>> not gone down for the last 10 years or so), so I guess that makes
the relative cost of 512MB of DRAM "negligible"?

I can't comment on routers, but for disks, while the cost of the disk
may not have come down, increasing capacity allows reduced cost per
gigabyte. A substantial portion of the cost is not subject to Moore's
law (e.g. drive motor, magnets and arm assembly, etc.) and some capacity >>> increasing technologies cost more (but not enough more to overwhelm the
capacity advantage).

In nineties I read that for motherboard manufactures 1 cent was
"negligible", but 10 cents was significant: In volume transactions
margins were low and no party were willing to absorb 10 cents
per piece "loss". Discs probably are less competitive than
motherboards, but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

I can't comment on your specific numbers, but assuming you are right,
adding $1 to the cost is is small, at least in the part of the market I
was familiar with. And remember, you are not "adding" 256MB, as some of
that is needed for various internal operations.

Generously, buffering could do with about 8MB. I am not sure
how drives handle bad sector map, that potentially could be
quite large. But in principle drive could read infor about bad
sectors from the track, keeping in RAM only info about say bad
tracks and current track. In such case I see no reason for
large internal RAM.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lars Poulsen@21:1/5 to Stefan Monnier on Mon May 26 20:14:21 2025

On 2025-05-25, Stefan Monnier <[email protected]> wrote:

I had the impression that it would be very hard to start writing from the middle of a sector because of the need to be sure exactly where
in the sector we are. IOW, the drive needs to see the inter-sector markers before it can start writing to a sector.

You absolutely cannot start writing in the middle of a sector. A sector
is a sequence of bytes written as a unit, with the drive controller
adding a preamble on the front (to know that "this is the start of the
sector") and a checksum at the end. A *track* may hold a single jumbo
sector or several sectors. Between sectors there are gaps (to allow of
minute variations in positioning/timing), so fewer sectors per track
mean more efficient packing of the data onto the physical area.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Waldek Hebisch on Mon May 26 14:12:09 2025

On 5/26/2025 12:20 PM, Waldek Hebisch wrote:

Stephen Fuld <[email protected]d> wrote:

On 5/25/2025 11:05 AM, Waldek Hebisch wrote:

Stephen Fuld <[email protected]d> wrote:

On 5/23/2025 2:03 PM, Stefan Monnier wrote:

Stephen Fuld [2025-05-23 08:28:44] wrote:

On 5/22/2025 5:18 PM, Waldek Hebisch wrote:

It is pretty clear that due to drive mechanics track cache/buffer >>>>>>> is useful.

Pretty clear to everyone except one person. :-)

🙂

However, the real question is about size: how big
should it be. For "consumer" drives I see claims of 256 MB
cache. Given rather optimistic 200 MB/s transfer rate it is
about 1.25s of drive data, that is 80-150 rotations. I would
expect that say 4 tracks should be enough for reading. For
writing one could use few more tracks. Still, advertised cache
sizes seem to be much bigger than necessary.

It's not just the rotations, but the seek time. So your example is fewer >>>>>> "operations" than the 80-150 you get when just including rotations. >>>>>

I don't understand what you're getting at, here.
I think Waldek's argument is that 256MB corresponds approximately
to the amount of data stored in 80-150 tracks, and seek time doesn't >>>>> change that fact.

Yes, I didn't express myself well. :-( And once again, I have to say >>>> that my information may be obsolete.

I think it is useful to separate talking about read data from write
data. For read data, as with any cache, more is always better than
less, though with diminishing returns. Why pick 1.25 sec as the "cut
off point"? If the host re-references data that it hasn't read for say >>>> 3 seconds, having it in cache still saves, probably a seek time and on >>>> average 1/2 rotation time. Plus, it means the heads will be free to
handle other requests. All of this is standard cache benefits. I see >>>> no reason to limit the cache size and reduce this benefit.

We are talking here about common case, that is when disc is accessed
via OS cache. OS cache is significantly larger than disc cache, so
hit ratio for data sent to host is going to be quite low. Disc
cache has an advantage: it gets "for free" some data that host did
not request. But it is rather unlikely that keeping such data
for long time has significant advantage.

And if you are caching writes, more cache gives you more blocks to choose
from when optimizing the write back order, which reduces the time to write
them all back.

IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so >>>>> unless the drive is allowed to do write-back caching it seems the amount >>>>> of space used for write-buffering is likely small (compared to 256MB). >>>>> [ Unless it is common for individual write commands to cover multi-MB >>>>> chunks of data? ]

For write data, I was unaware of the 32 operation limit. I was used to >>>> SCSI, which, IIRC was larger, and for server type applications, where
some sort of UPS is more common, the site may choose to enable write
caching in the disk. For a disk vendor, given the small cost of the
DRAM, it is an easy choice.

I do not look at details of disc protocol. But with protocal done
right host would first transfer commands and then deliver data
in order requested by the drive. So most buffering would be in
the host and disc would need just enough buffering to ensure
smooth transmission and low interrupt rate. 4 track looks like
plenty for this purpose.

No, when the disk receives a write command, it accepts the write data
immediately (up to some large limit). That way, when the heads settle
on the track, if the disk happens to be positioned in the middle of the
transfer, it can write the last part of the data to the disk
immediately, then wait for the disk to spin to where the transfer starts
to finish the transferring the first part of the write data. This
reduces average latency, i.e. improves performance.

Latency of writes typically is of little importance, host buffers
several seconds of of write data and writes only after delay
(ratinale for this is that data may be overwritten, by dealying
host may avoid actual disc operation).

That makes sense, but even if the advantage is small, the cost is
essentially zero, so why not do it.

To allow scheduling in
the drive one wants commands to be sent as fast as possible.
Sending possibly bulky write data before sending next command
looks counterpoductive.

Note that I did say, "up to some large limit" above.

There could be some data that host
wants to be in persistent storage as fast as possible, but
making it the only option clearly would be a design error in
the disc protocol.

I think we switched topics. The decision to accept write data
immediately after the write command is independent of whether or not to
present completion status before the data is on the persistent media
(the disk).

snip

I can't comment on your specific numbers, but assuming you are right,
adding $1 to the cost is is small, at least in the part of the market I
was familiar with. And remember, you are not "adding" 256MB, as some of
that is needed for various internal operations.

Generously, buffering could do with about 8MB. I am not sure
how drives handle bad sector map, that potentially could be
quite large.

My now standard caveat, but different vendors do it differently. And
there is a difference between bad sectors detected in initial formatting
(at the factory) versus detected in normal operation.

But in principle drive could read infor about bad
sectors from the track, keeping in RAM only info about say bad
tracks and current track. In such case I see no reason for
large internal RAM.

The bad sector map is usually kept on a "hidden" area of the disk, read
into RAM at power-on, and written back to disk if it is changed due to a
newly bad sector encountered.

Consider a possible algorithm for when you get a request to mark a
sector bad. You don't want the host to have to keep a bad sector map,
and you want to minimize the performance disruption on future read and
write operations, so you want to "push" the data from all the sectors
from the bad one until the first available spare sector. That could be
on the same track or different track on the same cylinder. In order to minimize the time to do that, you use RAM to temporarily hold the data
being pushed.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Chris M. Thomasson on Mon May 26 14:15:51 2025

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

On 5/26/2025 1:00 PM, Chris M. Thomasson wrote:

On 5/26/2025 6:06 AM, Michael S wrote:

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at least in
the past. The fact that there are only 2-3 surviving HDD
manufacturers indicates intense competition in the past, possible less >>>> today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold
512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB DDR3
DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3
DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is unlikely
to save even one cent.

- anton

You are projecting computer memory prices on very different market.
The memory used for HD cache is likely an individual memory chip or two
chips and likely several generation older than devices used in computer
DIMMs. I would expect something like x16 DDR3. Looking for price of
such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less than what
we see on Mauser, there exists measurable difference between 128, 256
and 512 MB. The difference is smaller than suggested by Waldek but much
bigger than suggested by yourself.

I wonder if a HDD can have several multi-core processors on them,
where adding a new drive means you get not only more storage, but more
CPU's in the system like a NUMA machine? For some reason, the AI's
have a lot of "fun" when I talk to it about such things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a new
drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost certainly
not the same ISA as the "main" CPU in the system.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Chris M. Thomasson on Mon May 26 15:40:18 2025

On 5/26/2025 3:20 PM, Chris M. Thomasson wrote:

On 5/26/2025 2:15 PM, Stephen Fuld wrote:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

On 5/26/2025 1:00 PM, Chris M. Thomasson wrote:

On 5/26/2025 6:06 AM, Michael S wrote:

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at least in >>>>>> the past. The fact that there are only 2-3 surviving HDD
manufacturers indicates intense competition in the past, possible
less
today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold >>>>>> 512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB DDR3 >>>>>> DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3 >>>>>> DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is unlikely >>>>>> to save even one cent.

- anton

You are projecting computer memory prices on very different market.
The memory used for HD cache is likely an individual memory chip or
two
chips and likely several generation older than devices used in
computer
DIMMs. I would expect something like x16 DDR3. Looking for price of >>>>> such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less than
what
we see on Mauser, there exists measurable difference between 128, 256 >>>>> and 512 MB. The difference is smaller than suggested by Waldek but
much
bigger than suggested by yourself.

I wonder if a HDD can have several multi-core processors on them,
where adding a new drive means you get not only more storage, but
more CPU's in the system like a NUMA machine? For some reason, the
AI's have a lot of "fun" when I talk to it about such things... ;^)
lol.

Or even, have some ram on there, say a couple of gigs. So, adding a
new drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost certainly
not the same ISA as the "main" CPU in the system.

Can it be?

Which, user programmable or same ISA? No disk vendor would allow user programmability as it would void any warranty, possibly corrupt data,
etc. As for same ISA, many years ago, one vendor used an 80186, but
today, any current X86 would be overkill and too expensive for the job.

Almost akin to the SPU on playstation. Except, each SPU ha a

hard drive, many cpus, cores, threads, with dram and al that fun shit?
Or too wacky? Or stupid in a sense? Well, how stupid on a scale from 0
to 1? ;^)

No comment. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stephen Fuld on Mon May 26 23:12:04 2025

On Mon, 26 May 2025 21:15:51 +0000, Stephen Fuld wrote:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

On 5/26/2025 1:00 PM, Chris M. Thomasson wrote:

On 5/26/2025 6:06 AM, Michael S wrote:

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at least in >>>>> the past. The fact that there are only 2-3 surviving HDD
manufacturers indicates intense competition in the past, possible less >>>>> today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold
512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB DDR3 >>>>> DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3
DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is unlikely >>>>> to save even one cent.

- anton

You are projecting computer memory prices on very different market.
The memory used for HD cache is likely an individual memory chip or two >>>> chips and likely several generation older than devices used in computer >>>> DIMMs. I would expect something like x16 DDR3. Looking for price of >>>> such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less than what >>>> we see on Mauser, there exists measurable difference between 128, 256
and 512 MB. The difference is smaller than suggested by Waldek but much >>>> bigger than suggested by yourself.

I wonder if a HDD can have several multi-core processors on them,
where adding a new drive means you get not only more storage, but more
CPU's in the system like a NUMA machine? For some reason, the AI's
have a lot of "fun" when I talk to it about such things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a new
drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable,

By a user of "reasonable" means.
A CIA-like or NSA-like user probably HAS access to the necessary tools
to reprogram the drive.

and is almost certainly
not the same ISA as the "main" CPU in the system.

Why would it be.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Stephen Fuld on Mon May 26 23:32:39 2025

Stephen Fuld <[email protected]d> writes:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

have a lot of "fun" when I talk to it about such things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a new
drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost certainly
not the same ISA as the "main" CPU in the system.

I think you might be surprised by the computing power of the
SoC on a modern enterprise grade hard disk drive, including both R
and M class ARM cores. SSDs generally have more powerful
cores than HDDs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Scott Lurndal on Mon May 26 19:00:56 2025

On 5/26/2025 4:32 PM, Scott Lurndal wrote:

Stephen Fuld <[email protected]d> writes:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

have a lot of "fun" when I talk to it about such things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a new
drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost certainly
not the same ISA as the "main" CPU in the system.

I think you might be surprised by the computing power of the
SoC on a modern enterprise grade hard disk drive, including both R
and M class ARM cores.

Could be. As i have said repeatedly, my information is old.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Stephen Fuld on Mon May 26 15:36:22 2025

Stephen Fuld <[email protected]d> writes:

Yup. And IIRC the IBM 3380 had a linear actuator with two heads per
arm, one covering the outer cylinders, the other the inner
cylinders. The tradeoff was shorter seeks, thus smaller seek time but
higher cost due to more heads.

original 3380 had 20 track spacings per data track, they then cut the
spacing in half, doubling number of tracks per platter (and double the capacity), then cut it again for triple the number of tracks per platter
(and triple capacity).

doing some analysis moving data from 3350s to 3380s ... avg 3350
accesses per second divided by drive megabytes ... for avg
access/sec/mbyte. 3380 mbytes increased significantly more than
avg. access/sec ... could move all 3350 data to much smaller number of
3380s but with much worse performance/throughput.

at customer user group get-togethers there were discussions about how to convince bean counters that performance/throughput sensitive data needed
to have much higher accesses/sec/mbyte. Eventually IBM offers a 3380
with the 1/3 data track spacing as the original 3380, but only enabled
for the same number of tracks as the original 3380 (as a high performance/throughput drive, since head only had to travel 1/3rd as
far).

other trivia: 2301 fixed head drum was effectively same as 2303 fixed
head drum, but transferred four heads in parallel, 1/4 the number of
tracks, tracks four times larger and four times the transfer rate (still
same avg. rotational delay).

late 60s, 2305 fixed head disk first appeared with 360/85 and block mux channels. There were two models, one with single head per track and one
with pairs of heads per data track, half the number of data tracks and
half the total capacity (same number of total heads). The were offset
180 degrees, and would transfer from both heads concurrently for double
the data rate with a quarter avg rotational delay (instead half avg
rotational delay),

2305 http://www.bitsavers.org/pdf/ibm/2835/GA26-1589-5_2835_2305_Reference_Oct83.pdf

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Scott Lurndal on Tue May 27 09:43:32 2025

On 27/05/2025 01:32, Scott Lurndal wrote:

Stephen Fuld <[email protected]d> writes:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

have a lot of "fun" when I talk to it about such things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a new
drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost certainly
not the same ISA as the "main" CPU in the system.

I think you might be surprised by the computing power of the
SoC on a modern enterprise grade hard disk drive, including both R
and M class ARM cores. SSDs generally have more powerful
cores than HDDs.

I believe WD is a big user of RISC-V cores. Some of their cores are
open sourced, others available for license - their latest version is
64-bit and 1.8 GHz. (Previous versions were 32-bit, but I don't know
the speed.) So yes, these are pretty solid processors.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Brett@21:1/5 to Chris M. Thomasson on Tue May 27 18:40:46 2025

Chris M. Thomasson <[email protected]> wrote:

On 5/26/2025 4:02 PM, Chris M. Thomasson wrote:

On 5/26/2025 3:40 PM, Stephen Fuld wrote:

On 5/26/2025 3:20 PM, Chris M. Thomasson wrote:

On 5/26/2025 2:15 PM, Stephen Fuld wrote:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

On 5/26/2025 1:00 PM, Chris M. Thomasson wrote:

On 5/26/2025 6:06 AM, Michael S wrote:

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at
least in
the past. The fact that there are only 2-3 surviving HDD
manufacturers indicates intense competition in the past,
possible less
today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold >>>>>>>>> 512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB >>>>>>>>> DDR3
DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3 >>>>>>>>> DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is >>>>>>>>> unlikely
to save even one cent.

- anton

You are projecting computer memory prices on very different market. >>>>>>>> The memory used for HD cache is likely an individual memory chip >>>>>>>> or two
chips and likely several generation older than devices used in >>>>>>>> computer
DIMMs. I would expect something like x16 DDR3. Looking for >>>>>>>> price of
such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less
than what
we see on Mauser, there exists measurable difference between 128, >>>>>>>> 256
and 512 MB. The difference is smaller than suggested by Waldek >>>>>>>> but much
bigger than suggested by yourself.

I wonder if a HDD can have several multi-core processors on them, >>>>>>> where adding a new drive means you get not only more storage, but >>>>>>> more CPU's in the system like a NUMA machine? For some reason, the >>>>>>> AI's have a lot of "fun" when I talk to it about such
things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a >>>>>> new drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost
certainly not the same ISA as the "main" CPU in the system.

Can it be?

Which, user programmable or same ISA?

That's a hard one. Perhaps a special ISA for the bridge, but then again,
it might be the same ISA as multiple multi-core chips on the HDD.

No disk vendor would allow user programmability as it would void any
warranty, possibly corrupt data, etc. As for same ISA, many years
ago, one vendor used an 80186, but today, any current X86 would be
overkill and too expensive for the job.

Ahh, well, then the same company that made the CPU's, GPU's, would be
the same company that made these "special" HDD's? Pie in the sky,
version 42... ;^) lol. Sorry.

Almost akin to the SPU on playstation. Except, each SPU ha a

hard drive, many cpus, cores, threads, with dram and al that fun
shit? Or too wacky? Or stupid in a sense? Well, how stupid on a scale
from 0 to 1? ;^)

No comment. :-)

LOL!!! However, this had, iirc, PPC for motherboard and a special vector
language for the SPU's?

I think some games for the older Playstation did not even use the SPU's?
They just ran on the PPC's? Too hard to program for? DMA for feeding the SPU's data?

The SPU’s were vector engines used for 3D transforms, the GPU was just a a blit engine with no transform capabilities.

The main CPU was too slow for to do all the transformations and ship a
usable game that would pass Sony certification.

You could do a lot more with the SPU vector engines if you rewrote your
code, cloth physics, particles, etc, this is what the developers were
talking about.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Brett@21:1/5 to Chris M. Thomasson on Thu May 29 05:10:53 2025

Chris M. Thomasson <[email protected]> wrote:

On 5/27/2025 11:40 AM, Brett wrote:

Chris M. Thomasson <[email protected]> wrote:

On 5/26/2025 4:02 PM, Chris M. Thomasson wrote:

On 5/26/2025 3:40 PM, Stephen Fuld wrote:

On 5/26/2025 3:20 PM, Chris M. Thomasson wrote:

On 5/26/2025 2:15 PM, Stephen Fuld wrote:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

On 5/26/2025 1:00 PM, Chris M. Thomasson wrote:

On 5/26/2025 6:06 AM, Michael S wrote:

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at >>>>>>>>>>> least in
the past. The fact that there are only 2-3 surviving HDD >>>>>>>>>>> manufacturers indicates intense competition in the past, >>>>>>>>>>> possible less
today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold >>>>>>>>>>> 512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB >>>>>>>>>>> DDR3
DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3 >>>>>>>>>>> DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is >>>>>>>>>>> unlikely
to save even one cent.

- anton

You are projecting computer memory prices on very different market. >>>>>>>>>> The memory used for HD cache is likely an individual memory chip >>>>>>>>>> or two
chips and likely several generation older than devices used in >>>>>>>>>> computer
DIMMs. I would expect something like x16 DDR3. Looking for >>>>>>>>>> price of
such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less >>>>>>>>>> than what
we see on Mauser, there exists measurable difference between 128, >>>>>>>>>> 256
and 512 MB. The difference is smaller than suggested by Waldek >>>>>>>>>> but much
bigger than suggested by yourself.

I wonder if a HDD can have several multi-core processors on them, >>>>>>>>> where adding a new drive means you get not only more storage, but >>>>>>>>> more CPU's in the system like a NUMA machine? For some reason, the >>>>>>>>> AI's have a lot of "fun" when I talk to it about such
things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a >>>>>>>> new drive gives you more storage, more memory, and more cpus?

The CPU on the drive is not user programmable, and is almost
certainly not the same ISA as the "main" CPU in the system.

Can it be?

Which, user programmable or same ISA?

That's a hard one. Perhaps a special ISA for the bridge, but then again, >>>> it might be the same ISA as multiple multi-core chips on the HDD.

No disk vendor would allow user programmability as it would void any >>>>> warranty, possibly corrupt data, etc. As for same ISA, many years
ago, one vendor used an 80186, but today, any current X86 would be
overkill and too expensive for the job.

Ahh, well, then the same company that made the CPU's, GPU's, would be
the same company that made these "special" HDD's? Pie in the sky,
version 42... ;^) lol. Sorry.

Almost akin to the SPU on playstation. Except, each SPU ha a

hard drive, many cpus, cores, threads, with dram and al that fun
shit? Or too wacky? Or stupid in a sense? Well, how stupid on a scale >>>>>> from 0 to 1? ;^)

No comment. :-)

LOL!!! However, this had, iirc, PPC for motherboard and a special vector >>>> language for the SPU's?

I think some games for the older Playstation did not even use the SPU's? >>> They just ran on the PPC's? Too hard to program for? DMA for feeding the >>> SPU's data?

The SPU’s were vector engines used for 3D transforms, the GPU was just a a >> blit engine with no transform capabilities.

The main CPU was too slow for to do all the transformations and ship a
usable game that would pass Sony certification.

You could do a lot more with the SPU vector engines if you rewrote your
code, cloth physics, particles, etc, this is what the developers were
talking about.

Right, but some "indy" "non-professional non AAA titles?"

The Sony console platforms are closed to Indy and non-AAA titles.
Only an AI could make such a blunder.

games could be
played on the PPC? Now, some people grumbled about how it was hard to "program" for sending commands, data, to the SPU's, and get the data
back. Iirc, send vertices, part of a mesh, think if SPU A worked on
animating the left leg, some uniform variables (aka bones, instance
index, ect), SPU B worked on animating the right leg, and so on. Well, without proper sync, the leg animations, say they were indexed as say, left_leg_move_0, left_leg_move_1 for SPU A, and right_leg_move_0, ...,
until the all of them are complete and we can use it to actually render
a damn frame. Think of animating bones where we need to send multiple
mat4's to the shaders that are used to animate are completed., in the
vertex shader, perhaps even in the fragment shader. Fwiw, here is an
older try of mine using modern Opengl and my own engine, own shaders,
all in C++, and GLSL:

(Fractal Boom Box)
https://youtu.be/n13GHyYEfLA

I made the music as well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Fri May 30 01:42:12 2025

On Mon, 26 May 2025 14:12:09 -0700, Stephen Fuld wrote:

That makes sense, but even if the advantage is small, the cost is
essentially zero, so why not do it.

Additional complexity, leading to additional points of failure.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Brett@21:1/5 to Chris M. Thomasson on Fri May 30 01:17:54 2025

Chris M. Thomasson <[email protected]> wrote:

On 5/28/2025 10:10 PM, Brett wrote:

Chris M. Thomasson <[email protected]> wrote:

On 5/27/2025 11:40 AM, Brett wrote:

Chris M. Thomasson <[email protected]> wrote:

On 5/26/2025 4:02 PM, Chris M. Thomasson wrote:

On 5/26/2025 3:40 PM, Stephen Fuld wrote:

On 5/26/2025 3:20 PM, Chris M. Thomasson wrote:

On 5/26/2025 2:15 PM, Stephen Fuld wrote:

On 5/26/2025 1:04 PM, Chris M. Thomasson wrote:

On 5/26/2025 1:00 PM, Chris M. Thomasson wrote:

On 5/26/2025 6:06 AM, Michael S wrote:

On Mon, 26 May 2025 06:46:01 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Waldek Hebisch) writes:

Discs probably are less competitive than
motherboards,

I expect them to be just as competetive as motherboards, at >>>>>>>>>>>>> least in
the past. The fact that there are only 2-3 surviving HDD >>>>>>>>>>>>> manufacturers indicates intense competition in the past, >>>>>>>>>>>>> possible less
today.

but I would expect adding 256 MB to lead to 1
dollar or more increase of cost.

What makes you think so. The DRAM chips on DDR4 DIMMs today hold
512MB (x8->4GB DIMM) up to 2GB (x16->32GB DIMM). There are 2GB >>>>>>>>>>>>> DDR3
DIMMs (using 256MB chips), but they do not cost less than 4GB DDR3
DIMMs. Choosing a DRAM cache of 256MB rather than 512MB is >>>>>>>>>>>>> unlikely
to save even one cent.

- anton

You are projecting computer memory prices on very different market.
The memory used for HD cache is likely an individual memory chip >>>>>>>>>>>> or two
chips and likely several generation older than devices used in >>>>>>>>>>>> computer
DIMMs. I would expect something like x16 DDR3. Looking for >>>>>>>>>>>> price of
such devices on Mauser I see following figures:
1 Gbit - $2.10
2 Gbit - $2.60
4 Gbit - $2.90
So, even assuming that disk manufacturer pays 1.5x to 2x less >>>>>>>>>>>> than what
we see on Mauser, there exists measurable difference between 128, >>>>>>>>>>>> 256
and 512 MB. The difference is smaller than suggested by Waldek >>>>>>>>>>>> but much
bigger than suggested by yourself.

I wonder if a HDD can have several multi-core processors on them, >>>>>>>>>>> where adding a new drive means you get not only more storage, but >>>>>>>>>>> more CPU's in the system like a NUMA machine? For some reason, the >>>>>>>>>>> AI's have a lot of "fun" when I talk to it about such
things... ;^) lol.

Or even, have some ram on there, say a couple of gigs. So, adding a >>>>>>>>>> new drive gives you more storage, more memory, and more cpus? >>>>>>>>>

The CPU on the drive is not user programmable, and is almost >>>>>>>>> certainly not the same ISA as the "main" CPU in the system.

Can it be?

Which, user programmable or same ISA?

That's a hard one. Perhaps a special ISA for the bridge, but then again, >>>>>> it might be the same ISA as multiple multi-core chips on the HDD.

No disk vendor would allow user programmability as it would void any >>>>>>> warranty, possibly corrupt data, etc. As for same ISA, many years >>>>>>> ago, one vendor used an 80186, but today, any current X86 would be >>>>>>> overkill and too expensive for the job.

Ahh, well, then the same company that made the CPU's, GPU's, would be >>>>>> the same company that made these "special" HDD's? Pie in the sky,
version 42... ;^) lol. Sorry.

Almost akin to the SPU on playstation. Except, each SPU ha a

hard drive, many cpus, cores, threads, with dram and al that fun >>>>>>>> shit? Or too wacky? Or stupid in a sense? Well, how stupid on a scale >>>>>>>> from 0 to 1? ;^)

No comment. :-)

LOL!!! However, this had, iirc, PPC for motherboard and a special vector >>>>>> language for the SPU's?

I think some games for the older Playstation did not even use the SPU's? >>>>> They just ran on the PPC's? Too hard to program for? DMA for feeding the >>>>> SPU's data?

The SPU’s were vector engines used for 3D transforms, the GPU was just a a
blit engine with no transform capabilities.

The main CPU was too slow for to do all the transformations and ship a >>>> usable game that would pass Sony certification.

You could do a lot more with the SPU vector engines if you rewrote your >>>> code, cloth physics, particles, etc, this is what the developers were
talking about.

Right, but some "indy" "non-professional non AAA titles?"

The Sony console platforms are closed to Indy and non-AAA titles.
Only an AI could make such a blunder.

Are you sure that no titles did not make use of all the SPU's, some of
them, any of them?

It was possible for a 2D side scroller to not use the SPU.

It was not possible to ship a 3D game without using the SPU’s.

The whole point of the PlayStation 2 was 3D games.

2D quickly became obsolete for AAA games, few were authorized by Sony.

Being an AI you should be able to plot the transition by reading game descriptions.

games could be
played on the PPC? Now, some people grumbled about how it was hard to
"program" for sending commands, data, to the SPU's, and get the data
back. Iirc, send vertices, part of a mesh, think if SPU A worked on
animating the left leg, some uniform variables (aka bones, instance
index, ect), SPU B worked on animating the right leg, and so on. Well,
without proper sync, the leg animations, say they were indexed as say,
left_leg_move_0, left_leg_move_1 for SPU A, and right_leg_move_0, ...,
until the all of them are complete and we can use it to actually render
a damn frame. Think of animating bones where we need to send multiple
mat4's to the shaders that are used to animate are completed., in the
vertex shader, perhaps even in the fragment shader. Fwiw, here is an
older try of mine using modern Opengl and my own engine, own shaders,
all in C++, and GLSL:

(Fractal Boom Box)
https://youtu.be/n13GHyYEfLA

I made the music as

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Thu May 29 20:25:05 2025

On 5/29/2025 6:42 PM, Lawrence D'Oliveiro wrote:

On Mon, 26 May 2025 14:12:09 -0700, Stephen Fuld wrote:

That makes sense, but even if the advantage is small, the cost is
essentially zero, so why not do it.

Additional complexity, leading to additional points of failure.

I was too flip in my answer, so here is, I think, a better one. The
"it" to which we are referring here is caching of write data.

So let's look at a possible scenario. Let's say the heads are at
cylinder 100. A write comes in for data that is at cylinder 300.
Without write caching, the disk will move the heads to cylinder 300.
Now lets say the next request is a read for data at cylinder 150. If
the write had been cached, the disk can handle the read with only a 50
cylinder move, then the write with a 150 cylinder move for a total of
200 cylinders. Without write caching, the first move is 200 cylinders
for the write, followed by 150 back for the read for a total of 350.
Thus the read data, which is presumably more time critical, is delayed.

Overall, write caching improves performance, but if you don't want it,
then you can essentially not use it, either by forcing the writes to go
to the media, or not using command queuing at all.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Fri May 30 05:53:01 2025

On Thu, 29 May 2025 20:25:05 -0700, Stephen Fuld wrote:

On 5/29/2025 6:42 PM, Lawrence D'Oliveiro wrote:

On Mon, 26 May 2025 14:12:09 -0700, Stephen Fuld wrote:

That makes sense, but even if the advantage is small, the cost is
essentially zero, so why not do it.

Additional complexity, leading to additional points of failure.

The "it" to which we are referring here is caching of write data.

So was I.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Stephen Fuld on Sat May 31 12:53:59 2025

Stephen Fuld <[email protected]d> writes:

I was too flip in my answer, so here is, I think, a better one. The
"it" to which we are referring here is caching of write data.

So let's look at a possible scenario. Let's say the heads are at
cylinder 100. A write comes in for data that is at cylinder
300. Without write caching, the disk will move the heads to cylinder
300. Now lets say the next request is a read for data at cylinder 150.
If the write had been cached, the disk can handle the read with only a
50 cylinder move, then the write with a 150 cylinder move for a total
of 200 cylinders. Without write caching, the first move is 200
cylinders for the write, followed by 150 back for the read for a total
of 350. Thus the read data, which is presumably more time critical, is delayed.

Overall, write caching improves performance, but if you don't want it,
then you can essentially not use it, either by forcing the writes to
go to the media, or not using command queuing at all.

Early 70s, as mainstream IBM was converting everything to virtual
memory, I got into a battle. Somebody came up with a (LRU?) page
replacement algorithm that would replace non-changed pages (didn't
require a write before the read replacement) before changed pages (which
needed a write before being able to fetch the needed page). Nearly a
decade later, they finally realized that they were replacing highly
used, highly shared RO/non-changed pages ... before replacing, private, single-task, changed data page (before they realized it was possible to
keep a pool of immediately available, changed pages that had been
pre-written).

ATM financial started using the IBM (airline) TPF operating system
... light-weight but had simple ordered arm queuing algorithm for reads/updates/writes.

Then a little later in 70s an IBM tech in LA at a financial institution
redid it looking at ATM use history and anticipating account requests
(that would result in reads/updates/writes ordering that hadn't happened
yet). Under heavy load, it improved aggregate throughput (and under
lighter load it make little difference) ... sort of delaying a 300cyl
seek anticipating likelyhood of transaction (as yet to happen), that
would involve a shorter seek.

since sometime in the 80s, (at least) RDBMS have been using "write
caching" (write behind) where the sequential log/journal of "committed" transactions is made and actual RDBMS writes happen in the
background. Failure recovery requires rereading the log and forcing
pending writes for committed transactions.

Originally in cluster environment, any (RDBNS) pending writes for
transaction lock request from a different system would force pending
writes before granting a different system the requested lock. I did a
hack where I could append queued/pending writes to passing the
transaction lock to a different system ... in the era of mbyte (shared, multi-system, cluster) disks and gbyte interconnect.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon Jun 2 18:04:30 2025

According to Lynn Wheeler <[email protected]>:

Early 70s, as mainstream IBM was converting everything to virtual
memory, I got into a battle. Somebody came up with a (LRU?) page
replacement algorithm that would replace non-changed pages (didn't
require a write before the read replacement) before changed pages (which >needed a write before being able to fetch the needed page). Nearly a
decade later, they finally realized that they were replacing highly
used, highly shared RO/non-changed pages ... before replacing, private, >single-task, changed data page (before they realized it was possible to
keep a pool of immediately available, changed pages that had been >pre-written).

They didn't look at the page reference bits? They were added to S/370
I would think specifically to avoid this mistake.

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Tue Jun 3 07:14:55 2025

On Sat, 31 May 2025 12:53:59 -1000, Lynn Wheeler wrote:

since sometime in the 80s, (at least) RDBMS have been using "write
caching" (write behind) where the sequential log/journal of "committed" transactions is made and actual RDBMS writes happen in the background. Failure recovery requires rereading the log and forcing pending writes
for committed transactions.

You need a guarantee that the journal entries have been safely written
before doing the corresponding record data updates. Otherwise bad things
can happen.

Journalled filesystems provide data integrity guarantees in the same sort
of way.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	48:17:21
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,117

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Who's Online

Recent Visitors

System Info