• Bug#1109880: libsqlite3-0: JSON-formatted parallel arrays =?UTF-8?Q?can

    From Manny@21:1/5 to All on Fri Jul 25 14:00:01 2025
    Package: libsqlite3-0
    Version: 3.40.1-2+deb12u1
    Severity: normal
    Tags: upstream
    X-Debbugs-Cc: [email protected]

    Apparently it is impossible to import data from a JSON tree that
    contains an array of records, each record of which contains another
    array. That’s the consequence of the bug. The bug is essentially that
    tables resulting from a clause formed as “json_each((select …),'$')”
    are incomplete because only the first row of input is processed by
    json_each.

    For a concrete example, take this JSON:

    $ curl -s 'https://lemmy-federate.com/api/instance.find?input=%7B%22search%22%3A%22%22%2C%22skip%22%3A0%2C%22take%22%3A6%2C%22enabledOnly%22%3Afalse%7D' > lemmy-federate_nodes.json

    This is how we might expect the innermost arrays to be imported:

    ===8<----------------------------------------
    #!/bin/bash

    sqlite3 -echo demo.db <<'EOF'
    create table if not exists relationTbl (
    id integer primary key,
    community_id integer,
    created text,
    updated text,
    status text,
    error_reason text,
    attempt_count text,
    hostname text,
    host_id text);

    insert or replace into relationTbl select
    json_extract(value,'$.id'),
    json_extract(value,'$.communityId'),
    json_extract(value,'$.createdAt'),
    json_extract(value,'$.updatedAt'),
    json_extract(value,'$.status'),
    json_extract(value,'$.errorReason'),
    json_extract(value,'$.attemptCount'),
    json_extract(value,'$.instance.host'),
    json_extract(value,'$.instance.id')
    from json_each((select json_extract(value,'$') from json_tree(readfile('lemmy-federate_communities.json'),'$.result.data.communities')
    where json_tree.type = 'array' and fullkey like '%follows'));

    EOF
    ===8<----------------------------------------

    The sample code executes and superficially appears to function, but
    running the following query reveals that only the first array is
    imported:

    ===8<----------------------------------------
    $ sqlite3 demo.db "select distinct community_id from relationTbl" ===8<----------------------------------------

    Six rows are expected from that query, but there is only one. This is
    somewhat dangerous because no error is produced and *some* data is
    imported. So the failure is not readily detectable, which can lead to
    data loss because someone might then delete their JSON source after
    importing.

    Other experiments reveal that json_tree has the same defect of only
    processing the first item of input when the input is another
    query. Other experiments also reveal an extraction defect when a JSON
    array is extracted from a column (this will be filed in a separate
    bug report).

    Bug 2:

    It’s worth noting as well that the possibility of feeding a (select …) query into json_each and json_tree is undocumented. Perhaps that’s intentional. I don’t know if SQlite has a requirements spec and if
    json_tree and json_each are included. One document (perhaps old) said
    these functions are “external” and not part of sqlite. Yet they are included in my stock version of sqlite and the most recent version of
    the docs (3.50.3) cover their existence -- just not the possibility of
    a query as input which I intuitively discovered by experimentation.

    Workaround:

    There is no apparent workaround within the confines of the sqlite library. But it’s possible to loop on the
    parallel arrays outside of sqlite and feed them back into sqlite, as follows:

    ===8<----------------------------------------
    while read json_array
    do
    sqlite3 demo.db <<EOF
    insert or replace into LFrelationTbl select
    json_extract(value,'$.id'),
    json_extract(value,'$.communityId'),
    json_extract(value,'$.createdAt'),
    json_extract(value,'$.updatedAt'),
    json_extract(value,'$.status'),
    json_extract(value,'$.errorReason'),
    json_extract(value,'$.attemptCount'),
    json_extract(value,'$.instance.host'),
    json_extract(value,'$.instance.id')
    from json_each('$json_array')
    EOF
    done <<< $(sqlite3 'file:///tmp?mode=memory' "select json_extract(value,'$.follows')
    from json_each(readfile('lemmy-federate_communities.json'),'$.result.data.communities');")
    ===8<----------------------------------------

    -- System Information:
    Debian Release: 12.11
    APT prefers stable-updates
    APT policy: (990, 'stable-updates'), (990, 'stable-security'), (990, 'stable'), (500, 'oldstable')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 5.10.0-28-amd64 (SMP w/2 CPU threads)
    Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
    Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages libsqlite3-0 depends on:
    ii libc6 2.36-9+deb12u10

    libsqlite3-0 recommends no packages.

    libsqlite3-0 suggests no packages.

    -- no debconf information

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)