On Thursday, 20 June 2024 16:29:11 BST Peter Humphrey wrote:
On Thursday, 20 June 2024 14:40:12 BST Michael wrote:
On Thursday, 20 June 2024 14:27:18 BST Jack wrote:
On 6/20/24 8:46 AM, Peter Humphrey wrote:
While building a new KDE system (see my post a few minutes ago), I'm
finding the system stalling because it can't handle all its install jobs. I have this set:
$ grep '\-j' /etc/portage/make.conf
EMERGE_DEFAULT_OPTS="--jobs --load-average=30 [...]"
I don't know how much it would matter, but are you missing a number after --jobs?
Without a number of jobs specified in make.conf emerge will not limit the number of packages it tries to build, except it will not start new jobs while there are at least --load-average=30 running already.
MAKEOPTS="-j16 -l16"
We went through all this at great length not long ago (months, perhaps: a certain A. McK had returned to the list for a while). /usr/bin/make will
stop spawning make jobs once either (a) the number it's running reaches
-j16 or (b) the load average of those reaches -l16. Portage sending more tasks to /usr/bin/make simply fills the latter's input queue.
Quite. Make will queue up anything above ~16 jobs, but emerge runs more than just make jobs. More and more emerge processes will kick off, up to ~30.
Each emerge process will eventually launch make jobs, only for these to join a pile up in an ever congested make queue, unable to proceed further. At some point memory allocation and reallocation of queues appears to have become gnarly. Perhaps something in portage's python code leads to a race condition? I don't know if a combination of the queuing up of all these parent-child instructions and their parallelism can create an unchecked race condition, perhaps you reached some memory allocation limit, or indeed a bug in the code. Just loose suppositions of mine, not evidence by detailed debugging, let alone knowledge of python.
The CPU has 24 threads and 64GB RAM, and lots of swap space, and those values have worked well for some time. Now, though, I'm going to have to
limit the --jobs or the --load-average.
On interrupting one such hang, I found that 32 install jobs had been waiting to run; is this limit hard coded?
It's certainly a suspicious number.
Apologies if I'm being dense here - why is it a suspicious number? I see a -- load-average of ~30 emerge instigated 'make install' jobs being queued up, while some previous 16 x make jobs are currently being processed.
I take it the --load-average is what it says, an average, so it will jump above the specified number if you have not limited the --jobs number.
See above re. input queue.
I also saw "too many jobs" or something, and "could not read job counter".
Is it now bug-report time?
You could set up a swap file, to avoid OOM situations, while you're tweaking the --jobs & --load-average.
The existing 64GiB swap partition is rarely touched, if ever. I've never
seen an OOM error. I haven't touched jobs or loads for many months until today, nor have I seen a failure to read a job counter.
I don't know if counters are stored in memory, with running/completed/failed counts, or on disk. I can't think either DDR4, or an NVMe, would clog up
their I/O channels, but you clearly witnessed a failure. Could this be a hardware glitch? You'll soon know if it shows up as a repeatable problem.
Anyway, it still rankles that I can't use more than half the machine's power because of limits in portage. This can't be the only 64GiB machine in
gentoo- land, surely.
I use 64G with no swap and MAKEOPTS="-j25 -l24.8"
I haven't as yet come across a failure like yours, but I rarely try to run
more than one emerge process at a time on this system. It's fast enough for
my limited needs without having to increase the number of emerges at a time.
On another PC which I often use as a binhost with 32G RAM, when I start two separate emerge processes manually with MAKEOPTS="-j10 -l9.8" I see swap being used a bit some times, especially when the PC user is hammering the browser
for hours with many tabs open. Anyway, the MAKEOPTS directives control resource usage without hiccups.
Does it make much of a difference in time saved running parallel emerges to require the addition of EMERGE_DEFAULT_OPTS?
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmZ0fVUACgkQseqq9sKV Zxk9phAAh9p+dj73TD5kNCDBbTWr82ZO6sFqA6tkvCgdnJW0Il+Xxlbbkh8VimL1 DCkBTJLtRh3wJONx4+TOkzG/FvM6QjDms9Tk55rpWdn2PkhJJAg+pfnqQAniq/Y+ O6r+bBYQGtq2dgLvT5/QaHfpdcm2ifYQeZ35hb3lZrQTO0f9vubuiasHJp7K4W5b 5g8aRiKp28J1MvCcH/55Vp+NJJ8wtNrQrQw9Vc3jkLYCDDWQzA8EDpzgM///x2h9 Z8pxDRE4DyoTO1nn40MfjFa+oMXlubYCY9y55CFZx+CNjiMp5bUl6sX858mZxvSb MIR3CM8dSkVVGeWzMAyZ+ksDr7HPrQizaK0LF0hhpsJq46tvaNvx5/sBUcPdM0b1 1nAm+P3rjblByBCl74yxhhYV985A9yyaopikfNc2ivzY92ZfRdg066WhOrXg+BAH cb6w/lZfq0Z5/09q+ejWOs2el4NSEncHdLI4IxoYOUAR9YdIzjORgkzE/OAo26zP vP8zJ6sJk1Cx2G/lIISlnmt4mQ24RkUHt7P+01gs2fQHG69NFVI8uRj/NQMoYGkF ymo3GhkOAfS6jdJ8GuWfVraMILZIxc5j/uJHtN2fHaVH+oQSb17e84foZDFRIEvR PrwR2fpChKzMzqFCEuduyJSOt6zv677jymWRlu9Qvra0d3ZEZv0=
=xPXc
-----END PGP SIGNATURE-----
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)