• Re: trying to parse lines from an awkwardly formatted HAR file ...

    From [email protected]@21:1/5 to Albretch Mueller on Sat Mar 23 07:50:01 2024
    On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
    out of a HAR file containing lots of obfuscating js cr@p and all kinds of nonsense I was able to extract line looking like:

    It's not "js cr@p", It is called JSON. And there's a spec for
    it.

    [...]

    I have tried substring substitution, sed et tr to no avail.

    You might have a lot of fun trying to parse JSON with sed and
    tr.

    If you are serious about it, you should try a proper parser
    and extractor. I'd recommend jq [1], available in Debian under
    the same-named package. I have written a few shell scripts
    reaching into the innards of

    You'll have to wrap your brain around it, but in the time you
    have implemented a parser for js in "sed and tr" (you might
    need a dash of "proper programming language" around that, some
    luck and a ton of elbow grease) you might have wrapped your
    brain like 16 times around jq (or some other appropriate tool).

    Cheers
    --
    tomás

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCZf56KwAKCRAFyCz1etHa RvKNAJ4gBnCljaosOnjO357xSXBgkRuWsgCeMFfMOY0For9yC6QVEJC6gmxTOW4=
    =SgW/
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Albretch Mueller on Sat Mar 23 09:20:01 2024
    Here's a hint at a start of what you need to do, it should be pretty easy to extend this, if it's unclear, let me know:

    for starters, run your "gunk" into jq like this:

    $ echo {\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. ABC123\"],\"collection\":[\"europeanlibraries\", \"americana\"],\"year\":1843,
    \"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]} | jq {
    "index": "prod-h-006",
    "fields": {
    "identifier": "bub_gb_O2EAAAAAMAAJ",
    "title": "Die Wissenschaft vom subjectiven Geist",
    "creator": [
    "Karl Rosenkranz",
    "Mr. ABC123"
    ],
    "collection": [
    "europeanlibraries",
    "americana"
    ],
    "year": 1843,
    "language": [
    "German"
    ],
    "item_size": 797368506
    },
    "_score": [
    50.629513
    ]
    }

    then, start building your output like this:

    echo {\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. ABC123\"],\"collection\":[\"europeanlibraries\", \"americana\"],\"year\":1843,\"
    language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]} | jq '.fields.identifier + "|" + .fields.title'

    jq is an amazing tool, it's a full fledged programming language. You just need to continue concatenating your desired output. You might even find you can do what you want all inside a jq script instead of what you're doing. Consider writing a jq
    script with the first line of the script #!/usr/bin/jq

    Hope this gets you on the right path!

    Michael Grant

    ________________________________
    From: [email protected]
    Sent: Friday, March 22, 2024 23:44
    To: Albretch Mueller
    Cc: debian-user
    Subject: Re: trying to parse lines from an awkwardly formatted HAR file ...

    On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
    out of a HAR file containing lots of obfuscating js cr@p and all kinds of nonsense I was able to extract line looking like:

    It's not "js cr@p", It is called JSON. And there's a spec for
    it.

    [...]

    I have tried substring substitution, sed et tr to no avail.

    You might have a lot of fun trying to parse JSON with sed and
    tr.

    If you are serious about it, you should try a proper parser
    and extractor. I'd recommend jq [1], available in Debian under
    the same-named package. I have written a few shell scripts
    reaching into the innards of

    You'll have to wrap your brain around it, but in the time you
    have implemented a parser for js in "sed and tr" (you might
    need a dash of "proper programming language" around that, some
    luck and a ton of elbow grease) you might have wrapped your
    brain like 16 times around jq (or some other appropriate tool).

    Cheers
    --
    tom�s

    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
    </head>
    <body dir="ltr">
    <div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">

    </div>
    <div id="appendonsend"></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">Here's a hint at a start of what you need to do, it should be pretty easy to
    extend
    this, if it's unclear, let me know:</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">for starters, run your &quot;gunk&quot; into jq like this:</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">$ echo {\&quot;index\&quot;:\&quot;prod-h-006\&quot;,\&quot;fields\&quot;:{\&quot;
    identifier\&quot;:\&quot;bub_gb_O2EAAAAAMAAJ\&quot;,\&quot;title\&quot;:\&quot;Die
    Wissenschaft vom subjectiven Geist\&quot;,\&quot;creator\&quot;:[\&quot;Karl Rosenkranz\&quot;, \&quot;Mr. ABC123\&quot;],\&quot;collection\&quot;:[\&quot;europeanlibraries\&quot;, \&quot;americana\&quot;],\&quot;year\&quot;:1843,\&quot;language\&quot;:[
    \&quot;German\&quot;],\&quot;item_size\&quot;:797368506},\&quot;_score\&quot;:[50.629513]} | jq</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">{</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &quot;index&quot;: &quot;prod-h-006&quot;,</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &quot;fields&quot;: {</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;identifier&quot;: &quot;bub_gb_O2EAAAAAMAAJ&quot;,</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;title&quot;: &quot;Die Wissenschaft vom subjectiven Geist&quot;,</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;creator&quot;: [</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &nbsp; &quot;Karl Rosenkranz&quot;,</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &nbsp; &quot;Mr. ABC123&quot;</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; ],</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;collection&quot;: [</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &nbsp; &quot;europeanlibraries&quot;,</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &nbsp; &quot;americana&quot;</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; ],</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;year&quot;: 1843,</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;language&quot;: [</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &nbsp; &quot;German&quot;</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; ],</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; &quot;item_size&quot;: 797368506</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; },</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &quot;_score&quot;: [</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; &nbsp; 50.629513</span></div>
    <div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">&nbsp; ]</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">}</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">then, start building your output like this:</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">echo {\&quot;index\&quot;:\&quot;prod-h-006\&quot;,\&quot;fields\&quot;:{\&quot;
    identifier\&quot;:\&quot;bub_gb_O2EAAAAAMAAJ\&quot;,\&quot;title\&quot;:\&quot;Die
    Wissenschaft vom subjectiven Geist\&quot;,\&quot;creator\&quot;:[\&quot;Karl Rosenkranz\&quot;, \&quot;Mr. ABC123\&quot;],\&quot;collection\&quot;:[\&quot;europeanlibraries\&quot;, \&quot;americana\&quot;],\&quot;year\&quot;:1843,\&quot;language\&quot;:[
    \&quot;German\&quot;],\&quot;item_size\&quot;:797368506},\&quot;_score\&quot;:[50.629513]} | jq '.fields.identifier +
    &quot;|&quot; + .fields.title'</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">jq is an amazing tool, it's a full fledged programming language.&nbsp; You just
    need to continue
    concatenating your desired output.&nbsp; You might even find you can do what you want all inside a jq script instead of what you're doing.&nbsp; Consider writing a jq script with the first line of the script #!/usr/bin/jq</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">Hope this gets you on the right path!</span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">Michael Grant</span></div>
    <div class="elementToProof" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

    </div>
    <hr style="display: inline-block; width: 98%;">
    <span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><b>From:</b>&nbsp;[email protected]<br>
    <b>Sent:</b>&nbsp;Friday, March 22, 2024 23:44<br>
    <b>To:</b>&nbsp;Albretch Mueller<br>
    <b>Cc:</b>&nbsp;debian-user<br>
    <b>Subject:</b>&nbsp;Re: trying to parse lines from an awkwardly formatted HAR file ...
    </span>
    <div><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
    </span></div>
    <div><span style="font-size: 11pt;">On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:<br>
    &gt; out of a HAR file containing lots of obfuscating js cr@p and all kinds of<br>
    &gt; nonsense I was able to extract line looking like:<br>

    It's not &quot;js cr@p&quot;, It is called JSON. And there's a spec for<br> it.<br>

    [...]<br>

    &gt; I have tried substring substitution, sed et tr to no avail.<br>

    You might have a lot of fun trying to parse JSON with sed and<br>
    tr.<br>

    If you are serious about it, you should try a proper parser<br>
    and extractor. I'd recommend jq [1], available in Debian under<br>
    the same-named package. I have written a few shell scripts<br>
    reaching into the innards of<br>

    You'll have to wrap your brain around it, but in the time you<br>
    have implemented a parser for js in &quot;sed and tr&quot; (you might<br>
    need a dash of &quot;proper programming language&quot; around that, some<br> luck and a ton of elbow grease) you might have wrapped your<br>
    brain like 16 times around jq (or some other appropriate tool).<br>

    Cheers<br>
    --<br>
    tom�s<br>
    </span></div>
    </body>
    </html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Christensen@21:1/5 to Albretch Mueller on Sat Mar 23 12:40:01 2024
    On 3/22/24 22:53, Albretch Mueller wrote:
    out of a HAR file containing lots of obfuscating js cr@p and all kinds of nonsense I was able to extract line looking like:

    var00='{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die
    Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\",
    \"Mr. ABC123\"],\"collection\":[\"europeanlibraries\", \"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}'
    echo "// __ \$var00: |$var00|"

    The final result that I need would look like:
    o
    var02='bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl Rosenkranz", "Mr. ABC123"]|["europeanlibraries", "americana"]|1843|["German"]|797368506|[50.629513]'
    echo "// __ \$var02: |$var02|"

    I have tried substring substitution, sed et tr to no avail.

    lbrtchx


    My daily driver:

    2024-03-23 04:02:27 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
    $ cat /etc/debian_version; uname -a; perl -v | head -n 2 | grep .
    11.9
    Linux laalaa 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31)
    x86_64 GNU/Linux
    This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi


    Put the JSON into a data file, one record per line (my mailer is
    line-wrapping data.json -- it contains two lines):

    2024-03-23 04:22:20 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
    $ cat data.json {"index":"prod-h-006","fields":{"identifier":"bub_gb_O2EAAAAAMAAJ","title":"Die Wissenschaft vom subjectiven Geist","creator":["Karl Rosenkranz", "Mr. ABC123"],"collection":["europeanlibraries", "americana"],"year":1843,"language":["German"],"item_size":797368506},"_score":[50.629513]}
    {"index":"prod-h-007","fields":{"identifier":"abc_de_12FGHIJKLMNO","title":"My Title","creator":["Some Body", "Somebody Else"],"collection":["europeanlibraries", "americana"],"year":2024,"language":["English"],"item_size":1234567890},"_score":[12.345678]}


    A Perl script to read newline-delimited JSON records and pretty print each:

    2024-03-23 04:28:59 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
    $ cat munge-json
    #!/usr/bin/perl
    # $Id: munge-json,v 1.3 2024/03/23 11:28:58 dpchrist Exp $
    # Refer to debian-user 3/22/24 22:53 Albretch Mueller
    # "trying to parse lines from an awkwardly formatted HAR file"
    # by David Paul Christensen [email protected]
    # Public Domain
    use strict;
    use warnings;
    use Data::Dumper;
    use JSON;
    use Getopt::Long;
    $Data::Dumper::Sortkeys = 1;
    my $debug;
    GetOptions('debug|d' => \$debug) or die;
    while (<>) {
    my $rh = decode_json $_;
    print Data::Dumper->Dump([$rh], [qw(rh)]) if $debug;
    print
    join('|',
    $rh->{fields}{identifier},
    $rh->{fields}{title},
    '["' . join('", "', @{$rh->{fields}{creator}}) . '"]',
    '["' . join('", "', @{$rh->{fields}{collection}}) . '"]',
    $rh->{fields}{year},
    '["' . join('", "', @{$rh->{fields}{language}}) . '"]',
    $rh->{fields}{item_size},
    '[' . join(', ', @{$rh->{_score}}) . ']',
    ), "\n";
    }


    Run the script as a Unix filter:

    2024-03-23 04:30:16 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
    $ ./munge-json data.json
    bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
    Rosenkranz", "Mr. ABC123"]|["europeanlibraries", "americana"]|1843|["German"]|797368506|[50.629513]
    abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody Else"]|["europeanlibraries", "americana"]|2024|["English"]|1234567890|[12.345678]

    2024-03-23 04:30:18 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
    $ cat data.json | ./munge-json
    bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
    Rosenkranz", "Mr. ABC123"]|["europeanlibraries", "americana"]|1843|["German"]|797368506|[50.629513]
    abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody Else"]|["europeanlibraries", "americana"]|2024|["English"]|1234567890|[12.345678]


    David

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to [email protected] on Sat Mar 23 14:40:02 2024
    On Sat, Mar 23, 2024 at 08:18:46AM +0000, [email protected] wrote:
    jq is an amazing tool, it's a full fledged programming language. You just need to continue concatenating your desired output. You might even find you can do what you want all inside a jq script instead of what you're doing. Consider writing a jq script with the first line of the script #!/usr/bin/jq

    Yeah. This. If you have to process JSON inputs in a bash script, use jq.
    Do not attempt to roll your own JSON parser. That work has already been
    done, and besides that, bash is a *terrible* language in which to write
    a parser.

    Here's an *incredibly* brief glimpse. It can do so much more:

    hobbit:~$ json='{"foo":"bar", "names": ["Alice","Bob"], "phone": {"Alice":"555-1234", "Bob":"555-2345"}}'
    hobbit:~$ printf %s "$json" | jq
    {
    "foo": "bar",
    "names": [
    "Alice",
    "Bob"
    ],
    "phone": {
    "Alice": "555-1234",
    "Bob": "555-2345"
    }
    }
    hobbit:~$ printf %s "$json" | jq .foo
    "bar"
    hobbit:~$ printf %s "$json" | jq -r .foo
    bar
    hobbit:~$ printf %s "$json" | jq -r .phone.Alice
    555-1234
    hobbit:~$ printf %s "$json" | jq -r '.names[1]'
    Bob

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Albretch Mueller on Sat Mar 23 17:00:02 2024
    On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
    a) using a chromium-derived browser, which can be used to dump the
    HAR file log of the network back and forth, go, e. g.:
    https://en.wikipedia.org/wiki/Anaxagoras
    b) click on the link that says: "Works by or about Anaxagoras" (at
    Internet Archive)
    c) on the archive.org page, select "texts" and "always available"
    (meaning text which is public domain, he died 25 centuries ago)
    d) then to produce the HAR file, go:
    d.1) More Tools > Developer Tools;
    d.2) click on "Network" tab;
    d.3) Filter: GET
    d.4) check: "Preserve Log"
    d.5) scroll down the page all the way to make the client-server back
    and forth cascade
    d.6) save the network log as HAR file to then open and eyeball it!

    This is incomprehensible to me. What the hell is d.5 supposed to be?
    Even if I close the Shift-Ctrl-I window, and Ctrl-R to reload the page,
    and then reopen Shift-Ctrl-I, and click the down-arrow-in-a-dish icon
    whose tooltip says "Export HAR..." all I get in the resulting file
    is this:

    hobbit:~$ cat Downloads/archive.org.har
    {
    "log": {
    "version": "1.2",
    "creator": {
    "name": "WebInspector",
    "version": "537.36"
    },
    "pages": [],
    "entries": []
    }
    }hobbit:~$

    Do you have one of these HAR files in a *DIRECTLY DOWNLOADABLE URL*?
    Something that doesn't take 12 manual steps that are impossible to
    perform?

    Or can you *attach* one to a message to this mailing list? Make sure
    it's small.

    1) That HAR file is not properly formatted. Instead of
    "attribute":value pairs in the standard way, they have used front
    slash + quote pairs (instead of just quotes) erratically all around
    the file. That is why you can't use jq.

    That is not what I see in the file which I pasted here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Greg Wooledge on Sat Mar 23 17:40:01 2024
    On Sat, Mar 23, 2024 at 11:55:04AM -0400, Greg Wooledge wrote:
    On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
    1) That HAR file is not properly formatted. Instead of
    "attribute":value pairs in the standard way, they have used front
    slash + quote pairs (instead of just quotes) erratically all around
    the file. That is why you can't use jq.

    That is not what I see in the file which I pasted here.

    Further investigation:

    https://google.com/search?q=what+is+a+HAR+file

    https://www.keycdn.com/support/what-is-a-har-file
    Jan 12, 2023 — A HAR file is primarily used for identifying
    performance issues, such as bottlenecks and slow load times, and page
    rendering problems.

    https://en.wikipedia.org/wiki/HAR_(file_format)
    The HTTP Archive format, or HAR, is a JSON-formatted archive file
    format for logging of a web browser's interaction with a site.
    ...
    This document was never published by the Web Performance Working Group
    and has been abandoned.

    So, putting these together, it looks like you are taking a file that
    was intended to be used for diagnosing browser/network performance
    issues, and attempting to use this in place of a downloadable index
    of documents from archive.org.

    Furthermore, whatever method you are using to *create* this HAR file
    is questionable, since apparently you aren't even getting a properly
    formatted file in the end.

    This tells me we're deep inside an X-Y problem. The original goal is
    possibly something like "I want an index of all the books about this
    Greek dude". Maybe start from there, and see what answers you get.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Darac Marjal@21:1/5 to All on Sat Mar 23 17:50:01 2024
    This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------zk9PHFvbOJ1vfgFtZ7JcFe1w
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64

    DQpPbiAyMy8wMy8yMDI0IDE2OjM0LCBHcmVnIFdvb2xlZGdlIHdyb3RlOg0KPiBPbiBTYXQs IE1hciAyMywgMjAyNCBhdCAxMTo1NTowNEFNIC0wNDAwLCBHcmVnIFdvb2xlZGdlIHdyb3Rl Og0KPj4gT24gU2F0LCBNYXIgMjMsIDIwMjQgYXQgMDk6NTQ6MDVBTSAtMDUwMCwgQWxicmV0 Y2ggTXVlbGxlciB3cm90ZToNCj4+PiAgIDEpIFRoYXQgSEFSIGZpbGUgaXMgbm90IHByb3Bl cmx5IGZvcm1hdHRlZC4gSW5zdGVhZCBvZg0KPj4+ICJhdHRyaWJ1dGUiOnZhbHVlIHBhaXJz IGluIHRoZSBzdGFuZGFyZCB3YXksIHRoZXkgaGF2ZSB1c2VkIGZyb250DQo+Pj4gc2xhc2gg KyBxdW90ZSBwYWlycyAoaW5zdGVhZCBvZiBqdXN0IHF1b3RlcykgZXJyYXRpY2FsbHkgYWxs IGFyb3VuZA0KPj4+IHRoZSBmaWxlLiBUaGF0IGlzIHdoeSB5b3UgY2FuJ3QgdXNlIGpxLg0K Pj4gVGhhdCBpcyBub3Qgd2hhdCBJIHNlZSBpbiB0aGUgZmlsZSB3aGljaCBJIHBhc3RlZCBo ZXJlLg0KPiBGdXJ0aGVyIGludmVzdGlnYXRpb246DQo+DQo+IGh0dHBzOi8vZ29vZ2xlLmNv bS9zZWFyY2g/cT13aGF0K2lzK2ErSEFSK2ZpbGUNCj4NCj4gICAgaHR0cHM6Ly93d3cua2V5 Y2RuLmNvbS9zdXBwb3J0L3doYXQtaXMtYS1oYXItZmlsZQ0KPiAgICBKYW4gMTIsIDIwMjMg 4oCUIEEgSEFSIGZpbGUgaXMgcHJpbWFyaWx5IHVzZWQgZm9yIGlkZW50aWZ5aW5nDQo+ICAg IHBlcmZvcm1hbmNlIGlzc3Vlcywgc3VjaCBhcyBib3R0bGVuZWNrcyBhbmQgc2xvdyBsb2Fk IHRpbWVzLCBhbmQgcGFnZQ0KPiAgICByZW5kZXJpbmcgcHJvYmxlbXMuDQo+DQo+ICAgIGh0 dHBzOi8vZW4ud2lraXBlZGlhLm9yZy93aWtpL0hBUl8oZmlsZV9mb3JtYXQpDQo+ICAgIFRo ZSBIVFRQIEFyY2hpdmUgZm9ybWF0LCBvciBIQVIsIGlzIGEgSlNPTi1mb3JtYXR0ZWQgYXJj aGl2ZSBmaWxlDQo+ICAgIGZvcm1hdCBmb3IgbG9nZ2luZyBvZiBhIHdlYiBicm93c2VyJ3Mg aW50ZXJhY3Rpb24gd2l0aCBhIHNpdGUuDQo+ICAgIC4uLg0KPiAgICBUaGlzIGRvY3VtZW50 IHdhcyBuZXZlciBwdWJsaXNoZWQgYnkgdGhlIFdlYiBQZXJmb3JtYW5jZSBXb3JraW5nIEdy b3VwDQo+ICAgIGFuZCBoYXMgYmVlbiBhYmFuZG9uZWQuDQo+DQo+IFNvLCBwdXR0aW5nIHRo ZXNlIHRvZ2V0aGVyLCBpdCBsb29rcyBsaWtlIHlvdSBhcmUgdGFraW5nIGEgZmlsZSB0aGF0 DQo+IHdhcyBpbnRlbmRlZCB0byBiZSB1c2VkIGZvciBkaWFnbm9zaW5nIGJyb3dzZXIvbmV0 d29yayBwZXJmb3JtYW5jZQ0KPiBpc3N1ZXMsIGFuZCBhdHRlbXB0aW5nIHRvIHVzZSB0aGlz IGluIHBsYWNlIG9mIGEgZG93bmxvYWRhYmxlIGluZGV4DQo+IG9mIGRvY3VtZW50cyBmcm9t IGFyY2hpdmUub3JnLg0KPg0KPiBGdXJ0aGVybW9yZSwgd2hhdGV2ZXIgbWV0aG9kIHlvdSBh cmUgdXNpbmcgdG8gKmNyZWF0ZSogdGhpcyBIQVIgZmlsZQ0KPiBpcyBxdWVzdGlvbmFibGUs IHNpbmNlIGFwcGFyZW50bHkgeW91IGFyZW4ndCBldmVuIGdldHRpbmcgYSBwcm9wZXJseQ0K PiBmb3JtYXR0ZWQgZmlsZSBpbiB0aGUgZW5kLg0KPg0KPiBUaGlzIHRlbGxzIG1lIHdlJ3Jl IGRlZXAgaW5zaWRlIGFuIFgtWSBwcm9ibGVtLiAgVGhlIG9yaWdpbmFsIGdvYWwgaXMNCj4g cG9zc2libHkgc29tZXRoaW5nIGxpa2UgIkkgd2FudCBhbiBpbmRleCBvZiBhbGwgdGhlIGJv b2tzIGFib3V0IHRoaXMNCj4gR3JlZWsgZHVkZSIuICBNYXliZSBzdGFydCBmcm9tIHRoZXJl LCBhbmQgc2VlIHdoYXQgYW5zd2VycyB5b3UgZ2V0Lg0KDQpJZiBzb21lb25lIHdhcyBsb29r aW5nIHRvIHF1ZXJ5IGEgV2ViIHNlcnZpY2UgcHJvZ3JhbW1hdGljYWxseSwgd291bGRuJ3Qg DQp0aGUgZmlyc3QgcGxhY2UgdG8gc3RhcnQgYmUgc2VlaW5nIGlmIHRoZSBzZXJ2aWNlIGhh cyBhbiBBUEk/DQoNCkFyY2hpdmUub3JnIGhhcyBhIHdlbGwtZG9jdW1lbnRlZCBBUEkgYXQg DQpodHRwczovL2FyY2hpdmUub3JnL2RldmVsb3BlcnMvLiBUaGVyZSdzIGV2ZW4gYSBjb21t YW5kLWxpbmUgdG9vbCANCihhc3N1bWluZyBvbmUgZG9lc24ndCB3YW50IHRvIHVzZSwgc2F5 LCB0aGUgcHl0aG9uIGxpYnJhcnkpLg0KDQo=

    --------------zk9PHFvbOJ1vfgFtZ7JcFe1w--

    -----BEGIN PGP SIGNATURE-----

    wsF5BAABCAAjFiEEaJ2XU/5QawksHjUq5unkJUjJEucFAmX/Bh8FAwAAAAAACgkQ5unkJUjJEucF hg//fYrlU8qUvd8oSe8P6hJ7mwY8rpCCqshnwm/uGUeM3e3h9wFgslLf18EscPGVzA4sB9Gi6q4/ 7UGMu1i7XPuBAgly3Lo8EFRdagH4smo7QvONFBKU6FULrxV/CnT18uPw+N7PDrySgBc7Y6HtyIdC EJvI8KhPr75DS8EhiaMofBHTWV7DYndIVu1L8g+tNOjF6XnqI5alVMk6X72fUpOeyI9tuRTq2PVQ M7hedJCed1dWXY2XKrc0xWS251dSIVf4ccEGwp1hHinTv+ijNGaR8f++CHJnzSp864TLePT3bsum fc/G4oAevj1EKM+9Ar6GD/Zc0TjeiT+BJFjgjdPF93sutrfmNJaQkEl61wkKz8r5inMsdjPP4x7n T/eOy1069pVbdpcLwKYAlE119gxIOy58WQsEfpmCMNqtmyhYPtN2anGrP6oXqtIiLPZRjpXZRYVN g3AWdE9zWB6dOOsy9kDxVINHMn3VYvhBgj6Q7jfRfMt+xdXOpFoMSEWO6lpXVgF4lxLkI8zESVNI QR+vFE5P+IzARmep2mwSn3Sy0SNYzkTE4bqyO4yd85im6Scs9uM+T0+2fpOA8N9OR/8clsdkcUhW zyJ9Ljj11onkRbmAEUXhEYveDT2umbqnPzo1Itx4mhGw5VVnmpmK0hvI4oWAXhKZ0wbphbfRg0w4 I3Y=
    =lrg1
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Albretch Mueller on Sat Mar 23 20:40:01 2024
    On Sat, Mar 23, 2024 at 02:05:06PM -0500, Albretch Mueller wrote:
    Actually, in order to deX-Y it in case anyone can offer any help, it
    is more like "I want an index of all the books which have ever been written/published" in order to read all of them ;-)

    First of all, you will not achieve this goal. It is not possible for
    a human to read every book that has ever been written. You'll die
    before you can even finish a tiny fraction of them.

    So, let's say you have a more realistic goal: you want a list of all the
    books written by Charles Dickens.

    I tried to figure out how to get this out of archive.org but it looks
    like their documentation doesn't match their web page. I started at <https://archive.org/developers/simplelists.html> which shows how to
    get a list of "items" which all share a common "parent". I figure
    an author might be a reasonable parent. So then the next question is
    how to get the author ID for Charles Dickens.

    Next I went to <https://archive.org/developers/tutorial-find-identifier-item.html>
    which tells me I should perform a search on their front page, and
    then on the result page, click something called "Media List".

    This is where it all falls apart for me. I can't find a "Media List"
    thing to click on.

    The documentation also mentions an "ABOUT" that I should be able
    to click on to get an Identifier. Well, that's not a thing I could
    find either. There's an ABOUT link in the top menu bar, which goes to <https://archive.org/about/> which is clearly not what the documentation
    was talking about.

    All this is far too much of my time wasted trying to help some random
    person with an off-topic question on debian-user, so... good luck.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to Greg Wooledge on Tue Mar 26 17:10:01 2024
    On Sat 23 Mar 2024 at 11:55:04 (-0400), Greg Wooledge wrote:
    On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
    a) using a chromium-derived browser, which can be used to dump the
    HAR file log of the network back and forth, go, e. g.:
    https://en.wikipedia.org/wiki/Anaxagoras
    b) click on the link that says: "Works by or about Anaxagoras" (at Internet Archive)
    c) on the archive.org page, select "texts" and "always available"
    (meaning text which is public domain, he died 25 centuries ago)
    d) then to produce the HAR file, go:
    d.1) More Tools > Developer Tools;
    d.2) click on "Network" tab;
    d.3) Filter: GET
    d.4) check: "Preserve Log"
    d.5) scroll down the page all the way to make the client-server back
    and forth cascade
    d.6) save the network log as HAR file to then open and eyeball it!

    This is incomprehensible to me. What the hell is d.5 supposed to be?
    Even if I close the Shift-Ctrl-I window, and Ctrl-R to reload the page,

    Some web pages don't load completely unless you scroll down them,
    whereupon more of the page is loaded. Even if you press End, you may
    not get the whole page loaded. One method of completion is to
    repeatedly press End and PageUp until no more content appears
    (or you observe some sort of bottom-of-page indication).

    You'll recognise this if you shop with Kroger™/Dillons™/Fry's™
    ( in the US).

    Ctrl-R is of no help: it can merely reload as much of the page as has
    been visited so far. So there is some method in their madness (for
    this one step—I don't know about the rest).

    and then reopen Shift-Ctrl-I, and click the down-arrow-in-a-dish icon
    whose tooltip says "Export HAR..." all I get in the resulting file
    is this:

    Cheers,
    David.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)