XPost: linux.debian.kernel
Hi all,
On Sun, 2025-05-04 at 13:45 +0200, Laurent Bonnaud wrote:
[...]
- Previously the kernel would output an error in /var/lib/systemd/pstore/ but would shutdown anyway.
- Now, with kernel 6.1.135-1, the shutdown is blocked as with 6.12.x kernels (see below).
--
Laurent.
<30>[ 961.098671] systemd-shutdown[1]: Rebooting.
<6>[ 961.098743] kvm: exiting hardware virtualization
<6>[ 961.361878] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
<6>[ 961.414526] ACPI: PM: Preparing to enter system sleep state S5
<0>[ 963.828210] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
<0>[ 963.828213] {1}[Hardware Error]: event severity: fatal
<0>[ 963.828214] {1}[Hardware Error]: Error 0, type: fatal
<0>[ 963.828216] {1}[Hardware Error]: section_type: PCIe error
<0>[ 963.828216] {1}[Hardware Error]: port_type: 0, PCIe end point
<0>[ 963.828217] {1}[Hardware Error]: version: 3.0
<0>[ 963.828218] {1}[Hardware Error]: command: 0x0002, status: 0x0010
<0>[ 963.828220] {1}[Hardware Error]: device_id: 0000:01:00.1
<0>[ 963.828221] {1}[Hardware Error]: slot: 6
<0>[ 963.828222] {1}[Hardware Error]: secondary_bus: 0x00
<0>[ 963.828223] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x1563 <0>[ 963.828224] {1}[Hardware Error]: class_code: 020000
<0>[ 963.828225] {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00018000
<0>[ 963.828226] {1}[Hardware Error]: aer_uncor_severity: 0x000ef010
<0>[ 963.828227] {1}[Hardware Error]: TLP Header: 40000001 0000000f 90028090 00000000
[...]
It seems that this is a known bug in the BIOS of several Dell PowerEdge
models including (in this case) the R540.
A workaround was added to the tg3 driver <
https://git.kernel.org/linus/e0efe83ed325277bb70f9435d4d9fc70bebdcca8>
and a similar change was proposed (but not accepted) in the i40e driver <
https://lore.kernel.org/all/[email protected]/>.
On tihis system the erorr log points to a deivce handled by the ixgbe
driver, and no workaround has been implemented for that.
Since this issue seems to affect multiple different NIC vendors and
drivers, would it make more sense to implement this workaround as a PCI
quirk?
Ben.
--
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed
- Carolyn Scheppner
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEErCspvTSmr92z9o8157/I7JWGEQkFAmhye14ACgkQ57/I7JWG EQkCGRAArhhsaQfYReSLaDpResfBQgdhi852snU1Y27XpESTwii1AV8M71XxWPnw m4WnPVctGUQg2Qb6nrwGaiJUr7Rj/R+RzkKynuYmVsVthZGTZtyOx525S/HjJVmQ IFOdJA0Mw2czAUo6xB4rwBga9Leq5U7y2zkjvVb9qtMs3A7y5FaLYSv8WRArECXx HP2BoWtxv3ItxcU9Os4TYwkcVQga9zpKCUxUzrUvLOKOAIneduV3zqUeoy0YD958 kkpXLN+PuqGzaxFLzr/r63d4wlBY+De2Vtd/yWKzSr+5n5ZeZ/yi6ZDxWjJXe42c B4IIIrh/EsZRXL0ThEwo6sjoaBFxCMwSLhdwIsIhTGXl702VXynS+CqRMT9G8x9T EUZj5F3PIKYSB5nb+r2t/XEosAL8z2a7bbZWkQUHruUXpycXCdDFa7rLJdqKBva0 TFYgstWr9V7oHzPsocZfT3k/UbArzGAwuKk0sWXTAobYmN1vun//muNK03xmu5V8 ib0t2CXjFiQLtoKPtfev2/BC5lYWb9lMUha7cukLZjPTNQr9dINvqKOc0OlcNd0d Lefcf8f13nzQDAr8U/kTWWzz0u1+fTR41jwrr+Qz0ohS7/JJis8hpZSW1ji2ImKj Q2YDgm36H28uGMpQzll638Q0SR7+A6CkCHeelMpKNmtulS9+z1w=
=tyMx
-----END PGP SIGNATURE-----
--- S