• Understanding pdfseparate error messages

    From Richard Owlett@21:1/5 to All on Thu Jul 24 15:30:01 2025
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+ companion
    pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Richard Owlett on Thu Jul 24 18:00:01 2025
    Richard Owlett <[email protected]> wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for
    later editing.

    I'm focusing on poppler-utils as it appears to offer tools for
    current and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the
    output of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the
    above suggests an approach to resolving.

    I don't understand the paragraph above, and especially what the mention
    of TECO infers?

    It involves being able to edit a *SINGLE* rather than all 100+
    companion pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream
    length Syntax Error (3557294): Missing 'endstream' or incorrect
    stream length [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream
    length Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    Good question that I'm not able to help answer, I'm afraid.

    But looking at the messages suggests that you PDF file may not be
    perfectly formed, so I suggest trying to validate it. There seems to be
    no shortage of PDF validators online!

    Another approach might be to try using one of the other tools that were suggested rather than poppler. They may produce clearer error messages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Richard Owlett on Thu Jul 24 17:40:01 2025
    On 7/24/25 8:20 AM, Richard Owlett wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+ companion pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3557294): Missing 'endstream' or incorrect stream length
        [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    TIA


    *A postscript

    I had originally composed this message before discovering "pdfseparate"
    had created output files that that appear to be what I intended.

    I'm still interested in the meaning of the error messages as it may hint
    as why ""pdftotext" wasn't *exactly* what I hoped for.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans@21:1/5 to All on Thu Jul 24 20:00:01 2025
    Am Donnerstag, 24. Juli 2025, 19:24:37 CEST schrieb David Wright:
    On Thu 24 Jul 2025 at 09:13:02 (-0500), David Wright wrote:
    On Thu 24 Jul 2025 at 08:20:33 (-0500), Richard Owlett wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for
    later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current

    If you do not need to do this automatically, you also could load the 100+
    files with pdfarranger, then mark all and unmark the two you want to extract.

    Now delete the marked, the two unmarked will stay.

    Save this one now with another name.

    You can edit this one with okular (mayve evince, too).

    Easy, peasy.

    Best

    Hans

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to Richard Owlett on Thu Jul 24 20:10:01 2025
    On Thu 24 Jul 2025 at 08:20:33 (-0500), Richard Owlett wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for
    later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+
    companion pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    Why on earth are you trying to debug these errors from either the
    PDF or pdfseparate? What's wrong with its output, apart from its size? pdftotext can happily convert dianostic.pdf.

    I would point out that pdfseparate produces a 3917970-byte PDF,
    whereas pdftk's output is 39666 bytes. Their converted text files
    are identical.

    Cheers,
    David.

    Table A4.14. Thrifty Food Plan Market Basket for males age 71 and older, June 2021:
    quantities, costs, and cost shares of Market Basket Categories a in weekly amounts

    Quantity b of each Cost of each Cost share of each
    Market Basket Categories Market Basket Market Basket Market Basket
    Category (lbs) Category ($) c Category (%) d

    Vegetables 9.13 10.37 20.67


    Dark-green vegetables 0.82 1.49 14.38


    Red and orange vegetables 1.95 2.71 26.12


    Beans, peas, lentilse 1.72 1.53 14.78


    Starchy vegetables 2.92 2.36 22.76


    Other vegetables 1.72 2.28 21.97


    Fruits 6.74 6.76 13.46


    Whole fruit 4.62 5.05 74.82


    100% fruit juice 2.12 1.70 25.18


    Grains 3.59 7.43 14.81

    Whole-grain staple grains (e.g.,
    2.06 4.93 66.34
    rice, pasta, breads, tortillas)
    Whole-grain cereals (e.g.,
    <0.01 <0.01 <0.01
    oatmeal, ready-to-eat cereal) f
    Refined-grain staple grains (e.g.,
    1.45 2.25 30.31
    rice, pasta, breads, tortillas)
    Refined-grain other (e.g., cereals,
    0.09 0.25 3.34
    crackers, snacks)

    Dairy 12.36 7.35 14.65

    Low- and non-fat milk, yogurt,
    12.27 7.23 98.31
    soy alternatives g
    Higher fat milk, yogurt, soy
    0.08 0.12 1.69
    alternatives h

    Cheese 0.00 0.00 0.00


    Protein foods 5.60 15.86 31.60


    Meats 0.68 2.79 17.60


    Poultry 2.33 5.85 36.87




    Thrifty Food Plan • 2021 106


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to David Wright on Thu Jul 24 19:40:01 2025
    On Thu 24 Jul 2025 at 09:13:02 (-0500), David Wright wrote:
    On Thu 24 Jul 2025 at 08:20:33 (-0500), Richard Owlett wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for
    later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+
    companion pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    Why on earth are you trying to debug these errors from either the
    PDF or pdfseparate? What's wrong with its output, apart from its size? pdftotext can happily convert dianostic.pdf.

    I would point out that pdfseparate produces a 3917970-byte PDF,
    whereas pdftk's output is 39666 bytes. Their converted text files
    are identical.

    Cheers,
    David.
    [2nd attempt]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)