• Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference

    From Rainer Dorsch@21:1/5 to All on Sun Feb 9 16:20:01 2025
    Hello,

    during reboot of Cubox-i with stable kernel 6.1.0-29-armmp, I got a kernel
    Oops (though the reboot did complete eventually):

    [2406987.476525] 8<--- cut here ---
    [2406987.479798] Unable to handle kernel NULL pointer dereference at virtual address 00000000
    [2406987.488157] [00000000] *pgd=00000000
    [2406987.491976] Internal error: Oops: 5 [#1] SMP ARM
    [2406987.496795] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_comment ip6_tables xt_recent ipt_REJECT nf_reject_ipv4 xt_conntrack xt_hashlimit xt_addrtype xt_mark nft_chain_nat xt_MASQUERADE xt_CT xt_tcpudp nft_compat xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcsec_gss_krb5 nfsv4 dns_resolver nfs nf_tables libcrc32c fscache netfs nfnetlink zram(-) zsmalloc binfmt_misc caam_jr caamhash_desc caamalg_desc crypto_engine authenc libdes dw_hdmi_ahb_audio dw_hdmi_cec brcmfmac evdev brcmutil imx6_media_csi(C) v4l2_fwnode ftdi_sio ch341 cfg80211 usbserial
    rfkill snd_soc_imx_spdif caam error video_mux coda_vpu
    [2406987.497296] imx_thermal snd_soc_fsl_spdif snd_soc_fsl_utils
    imx6_media(C) dw_hdmi_imx dw_hdmi imx_pcm_dma drm_display_helper imx_media_common(C) v4l2_jpeg imx_vdoa snd_soc_core v4l2_mem2mem imx2_wdt videobuf2_dma_contig snd_pcm_dmaengine v4l2_async videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_pcm snd_timer videodev imxdrm cec snd etnaviv drm_dma_helper mc gpu_sched soundcore drm_kms_helper imx_ipu_v3 gpio_ir_recv rc_core leds_pwm imx6q_cpufreq 8021q garp mrp stp llc nfsd auth_rpcgss nfs_acl lockd fuse loop drm grace dm_mod configfs sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic at803x ci_hdrc_imx ci_hdrc ulpi fec selftests ahci_imx roles of_mdio libahci_platform ehci_hcd fixed_phy libahci fwnode_mdio udc_core libphy phy_generic nvmem_imx_ocotp usbcore sdhci_esdhc_imx i2c_imx sdhci_pltfm cqhci mux_mmio mux_core libata sdhci usbmisc_imx scsi_mod scsi_common anatop_regulator phy_mxs_usb pwm_imx27 gpio_mxc
    [2406987.669806] CPU: 0 PID: 9106 Comm: rmmod Tainted: G C 6.1.0-29-armmp #1 Debian 6.1.123-1
    [2406987.679578] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [2406987.686300] PC is at zcomp_cpu_dead+0x14/0x58 [zram]
    [2406987.691486] LR is at cpuhp_invoke_callback+0xd4/0x6fc
    [2406987.696745] pc : [<bf6c52b4>] lr : [<c034bac0>] psr: 60070013 [2406987.703204] sp : f0bbde10 ip : c142394c fp : 00000000
    [2406987.708621] r10: 2da6a000 r9 : eed93350 r8 : bf6c52a0
    [2406987.714034] r7 : 00000008 r6 : 00000044 r5 : c1329350 r4 : 00000000 [2406987.720753] r3 : c140b750 r2 : 00000001 r1 : 00000008 r0 : 00000000 [2406987.727472] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
    [2406987.734806] Control: 10c5387d Table: 16fec04a DAC: 00000051 [2406987.740742] Register r0 information: NULL pointer
    [2406987.745645] Register r1 information: non-paged memory
    [2406987.750892] Register r2 information: non-paged memory
    [2406987.756137] Register r3 information: non-slab/vmalloc memory [2406987.761993] Register r4 information: NULL pointer
    [2406987.766890] Register r5 information: non-slab/vmalloc memory [2406987.772746] Register r6 information: non-paged memory
    [2406987.777993] Register r7 information: non-paged memory
    [2406987.783237] Register r8 information: 7-page vmalloc region starting at 0xbf6c5000 allocated at load_module+0xa70/0x2148
    [2406987.794250] Register r9 information: non-slab/vmalloc memory [2406987.800107] Register r10 information: non-paged memory
    [2406987.805440] Register r11 information: NULL pointer
    [2406987.810423] Register r12 information: non-slab/vmalloc memory [2406987.816364] Process rmmod (pid: 9106, stack limit = 0x589ba9ab) [2406987.822481] Stack: (0xf0bbde10 to 0xf0bbe000)
    [2406987.827035] de00: 00000000 c1329350 00000044 c034bac0
    [2406987.835414] de20: 00000002 c0d29540 c142394c 00000000 f0bbdea4 f0bbdea4 c7102040 f0bbde3c
    [2406987.843789] de40: f0bbde3c b877c437 00000000 00000000 00000000 c140b210 00000008 c1329350
    [2406987.852166] de60: c14232e0 c140b0a8 00000004 c034c81c 00000000 c0d28b54 00000000 00000044
    [2406987.860542] de80: c140b210 00000008 c1329350 c034cb00 00000000 c60e1e00 00000000 00000000
    [2406987.868917] dea0: c140b750 c60e1e10 c2081540 bf6c5318 00000004 bf6c6d58 00000000 c60e1e00
    [2406987.877294] dec0: 00000000 00000000 bf6c6fb4 bf6ca03c c7102040 00000081 0138f138 bf6c6ec0
    [2406987.885671] dee0: c518c734 00000000 00000000 bf6c6fc8 c518c734 c0cf3f28 00000000 00000040
    [2406987.894047] df00: 00000000 c518c720 c1975c4c b877c437 bf6ca480 bf6ca03c 00000000 c7102040
    [2406987.902423] df20: c03002f0 bf6c8a94 bf6ca240 00000800 00000000 c03eecf4 00000006 00000000
    [2406987.910799] df40: 00000000 00000000 00000000 00000000 6d61727a 00000000 00000000 00000000
    [2406987.919172] df60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [2406987.927548] df80: 00000000 00000000 00000000 b877c437 0138f138 0138e190 0138f138 00000000
    [2406987.935925] dfa0: 00000081 c03000c0 0138e190 0138f138 0138f174 00000800 00000000 00000000
    [2406987.944300] dfc0: 0138e190 0138f138 00000000 00000081 beb75eeb 00000001 00000002 0138f138
    [2406987.952675] dfe0: 0043fe10 beb75b8c 0041f0a5 b6c48168 00010030 0138f174 00000000 00000000
    [2406987.961101] zcomp_cpu_dead [zram] from cpuhp_invoke_callback+0xd4/0x6fc [2406987.968039] cpuhp_invoke_callback from cpuhp_issue_call+0x54/0x1b4 [2406987.974523] cpuhp_issue_call from __cpuhp_state_remove_instance+0xf8/0x1b4
    [2406987.981702] __cpuhp_state_remove_instance from zcomp_destroy+0x20/0x34 [zram]
    [2406987.989153] zcomp_destroy [zram] from zram_reset_device+0x114/0x170 [zram]
    [2406987.996345] zram_reset_device [zram] from zram_remove+0x10c/0x120 [zram] [2406988.003358] zram_remove [zram] from zram_remove_cb+0x14/0x5c [zram] [2406988.009941] zram_remove_cb [zram] from idr_for_each+0x5c/0x108 [2406988.016084] idr_for_each from destroy_devices+0x38/0x68 [zram] [2406988.022240] destroy_devices [zram] from sys_delete_module+0x194/0x320 [2406988.028990] sys_delete_module from ret_fast_syscall+0x0/0x1c [2406988.034943] Exception stack(0xf0bbdfa8 to 0xf0bbdff0)
    [2406988.040190] dfa0: 0138e190 0138f138 0138f174 00000800 00000000 00000000
    [2406988.048572] dfc0: 0138e190 0138f138 00000000 00000081 beb75eeb 00000001 00000002 0138f138
    [2406988.056945] dfe0: 0043fe10 beb75b8c 0041f0a5 b6c48168
    [2406988.062194] Code: e52de004 e28dd004 e30b3750 e34c3140 (e5114008) [2406988.069040] ---[ end trace 0000000000000000 ]---

    Any idea or hint what could cause this is welcome.

    Thanks
    Rainer

    --
    Rainer Dorsch
    http://bokomoko.de/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arnd Bergmann@21:1/5 to Rainer Dorsch on Tue Feb 11 08:30:01 2025
    On Sun, Feb 9, 2025, at 15:38, Rainer Dorsch wrote:

    during reboot of Cubox-i with stable kernel 6.1.0-29-armmp, I got a kernel Oops (though the reboot did complete eventually):

    Hi Rainer,

    [2406987.476525] 8<--- cut here ---
    [2406987.479798] Unable to handle kernel NULL pointer dereference at virtual address 00000000

    A NULL pointer was dereferenced, which in this case is almost
    certainly a logic bug in kernel code.

    [2406987.669806] CPU: 0 PID: 9106 Comm: rmmod Tainted: G C 6.1.0-29-armmp #1 Debian 6.1.123-1
    [2406987.679578] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [2406987.686300] PC is at zcomp_cpu_dead+0x14/0x58 [zram]
    [2406987.691486] LR is at cpuhp_invoke_callback+0xd4/0x6fc

    You can get the exact code location by running the oops through
    'addr2line', but the function is fairly short.

    [2406987.816364] Process rmmod (pid: 9106, stack limit = 0x589ba9ab)

    This happened while unloading a module

    [2406987.961101] zcomp_cpu_dead [zram] from cpuhp_invoke_callback+0xd4/0x6fc [2406987.968039] cpuhp_invoke_callback from cpuhp_issue_call+0x54/0x1b4 [2406987.974523] cpuhp_issue_call from __cpuhp_state_remove_instance+0xf8/0x1b4
    [2406987.981702] __cpuhp_state_remove_instance from zcomp_destroy+0x20/0x34 [zram]
    [2406987.989153] zcomp_destroy [zram] from zram_reset_device+0x114/0x170 [zram]
    [2406987.996345] zram_reset_device [zram] from zram_remove+0x10c/0x120 [zram]
    [2406988.003358] zram_remove [zram] from zram_remove_cb+0x14/0x5c [zram] [2406988.009941] zram_remove_cb [zram] from idr_for_each+0x5c/0x108 [2406988.016084] idr_for_each from destroy_devices+0x38/0x68 [zram] [2406988.022240] destroy_devices [zram] from sys_delete_module+0x194/0x320 [2406988.028990] sys_delete_module from ret_fast_syscall+0x0/0x1c

    This is the entire backtrace, showing that only the zram module
    was involved.

    Linux-6.1 is fairly old, and this file has changed a bit between
    that and 6.13, though none of the changes here immediately point
    to a NULL pointer dereference:

    b8f03cb703a1 zram: move immutable comp params away from per-CPU context 6a81bdfeb350 zram: introduce zcomp_ctx structure
    52c7b4e2ba50 zram: introduce zcomp_req structure
    f2bac7ad187d zram: introduce zcomp_params structure
    1a78390d8760 zram: check that backends array has at least one backend 1d3100cf148d zram: add 842 compression backend support
    84112e314f69 zram: add zlib compression backend support
    73e7d81abbc8 zram: add zstd compression backend support
    c60a4ef54446 zram: add lz4hc compression backend support
    22d651c3b339 zram: add lz4 compression backend support
    2152247c55b6 zram: add lzo and lzorle compression backends support
    917a59e81c34 zram: introduce custom comp backends API
    45866e0e214f zram: do not allocate physically contiguous strm buffers 7ac07a26dea7 zram: preparation for multi-zcomp support

    This is the code in question (from 6.13):

    static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
    {
    comp->ops->destroy_ctx(&zstrm->ctx);
    vfree(zstrm->buffer);
    zstrm->buffer = NULL;
    }
    int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
    {
    struct zcomp *comp = hlist_entry(node, struct zcomp, node);
    struct zcomp_strm *zstrm;

    zstrm = per_cpu_ptr(comp->stream, cpu);
    zcomp_strm_free(comp, zstrm);
    return 0;
    }

    If you look at the vmlinux file with objdump, you can probably
    figure out if the bug is dereferencing zstrm or comp. The other
    things I would try to narrow down the problem are:

    - unload the module manually during runtime
    - update the kernel to a more recent one, such as 6.12
    - use a different compression backend for zram (zstd, deflate, lzo, ...)

    Arnd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)