Stephen Fuld <
[email protected]d> writes:
On 8/15/2025 11:19 AM, BGB wrote:
On 8/15/2025 11:53 AM, John Levine wrote:
According to Scott Lurndal <[email protected]>:
Section 2.7 also describes a 128-entry TLB. The TLB is claimed to >>>>>> have "typically 97% hit rate". I would go for larger pages, which >>>>>> would reduce the TLB miss rate.
I think that in 1979 VAX 512 bytes page was close to optimal. ...
One must also consider that the disks in that era were
fairly small, and 512 bytes was a common sector size.
Convenient for both swapping and loading program text
without wasting space on the disk by clustering
pages in groups of 2, 4 or 8.
That's probably it but even at the time the pages seemed rather small.
Pages on the PDP-10 were 512 words which was about 2K bytes.
Yeah.
Can note in some of my own testing, I tested various page sizes, and
seemingly found a local optimum at around 16K.
I think that is consistent with what some others have found. I suspect
the average page size should grow as memory gets cheaper, which leads to
more memory on average in systems. This also leads to larger programs,
as they can "fit" in larger memory with less paging. And as disk
(spinning or SSD) get faster transfer rates, the cost (in time) of
paging a larger page goes down. While 4K was the sweet spot some
decades ago, I think it has increased, probably to 16K. At some point
in the future, it may get to 64K, but not for some years yet.
ARM64 (ARMv8) architecturally supports 4k, 16k and 64k. When
ARMv8 first came out, one vendor (Redhat) shipped using 64k pages,
while Ubuntu shipped with 4k page support. 16k support by the
processor was optional (although the Neoverse cores support all
three, some third-party cores developed before ARM added 16k
pages to the architecture specification only supported 4k and 64k).
Say, for example, at 64K one part of the page may be being accessed
readily but another part of the page isn't really being accessed at all
(and increasing page size only really sees benefit for TLB miss rate so
long as the whole page is "actually being used").
Not necessarily. Consider the case of a 16K (or larger) page with two
"hot spots" that are more than 4K apart. That takes 2 TLB slots with 4K >pages, but only one with larger pages.
Note that the ARMv8 architecture[*] supports terminating the table walk
before reaching the smallest level, so with 4K pages[**], a single TLB
entry can cover 4K, 2M or 1GB blocks. With 16k pages, a single
TLB entry can cover 16k, 32MB or 64GB blocks. 64k pages support
64k, 512M and 4TB block sizes.
[*] Intel, AMD and others have similar "large page" capabilities.
[**] Granules, in ARM terminology.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)