• Re: Redirect output from less to text editor

    From Spiros Bousbouras@21:1/5 to Ottavio Caruso on Thu Apr 21 15:07:12 2022
    On Thu, 21 Apr 2022 15:47:32 +0100
    Ottavio Caruso <[email protected]> wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:


    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    ?

    If the editors in question support an option to read from stdin then you can use that through a pipe. Othherwise I can't think of anything. Beyond that ,
    I don't think that less really does any formatting , it just wraps lines
    and perhaps a bit more. A decent text editor should be able to do this on its own so I'm not clear why you want to involve less .Note that there is also
    the fmt utility to wrap lines.

    and 2)

    Any way to clean up unprintable characters before sending them to xed?

    As long as you have a clear idea what the unprintable characters are , then
    you can use sed .For example
    sed -e 's/\o000//g' -e 's/\o001//g' file.pdf

    will omit octets with value 0 or 1. I think the \o000 syntax is GNU
    specific.

    --
    vlaho.ninja/prog

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ottavio Caruso@21:1/5 to All on Thu Apr 21 15:47:32 2022
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:


    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    ?

    and 2)

    Any way to clean up unprintable characters before sending them to xed?

    --
    Ottavio Caruso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to Ottavio Caruso on Thu Apr 21 17:59:33 2022
    On 21/04/2022 15:47, Ottavio Caruso wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:


    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    ?

    and 2)

    Any way to clean up unprintable characters before sending them to xed?


    without using less : pdftotext?

    --
    Chris Elvidge
    England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Ottavio Caruso on Thu Apr 21 11:07:28 2022
    Ottavio Caruso <[email protected]> writes:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If that works (it doesn't for me) then you probably have less configured
    to use an input preprocessor via $LESSOPEN. "man less" for details.

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:


    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    ?

    and 2)

    Any way to clean up unprintable characters before sending them to xed?

    You can probably use the same filter used by LESSOPEN.

    --
    Keith Thompson (The_Other_Keith) [email protected]
    Working, but not speaking, for Philips
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to [email protected] on Thu Apr 21 17:55:05 2022
    In comp.unix.shell, Ottavio Caruso <[email protected]> wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will
    show all the pdf garbage.

    pdftotext is the typical tool for turning a PDF into text (but it
    doesn't OCR images, so it's not always an ideal tool). It sounds like
    you are getting a 'strings' like output from less. That's probably less
    useful, unless you really do want the comments in the PDF.

    Elijah
    ------
    "views" binary files in vim sometimes

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Javier@21:1/5 to Ottavio Caruso on Thu Apr 21 19:53:50 2022
    Ottavio Caruso <[email protected]> wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:

    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    There is vipe from the moreutils package to edit/view in an external
    program the content of stdout inside a pipeline.

    https://joeyh.name/code/moreutils/

    $ less file.pdf | EDITOR=xed vipe

    and 2)

    Any way to clean up unprintable characters before sending them to xed?

    For that you have GNU strings.

    $ less file.pdf | strings | EDITOR=xed vipe

    But being pdf files I would rather use dedicated tools like pdftotext
    from poppler or pdf2txt from pdfminer

    http://www.unixuser.org/~euske/python/pdfminer/

    lesspipe.sh (as Keith Thompson suggested to look at) uses pdftotext for
    pdf files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Javier@21:1/5 to Javier on Thu Apr 21 20:25:42 2022
    Javier <[email protected]d> wrote:
    or pdf2txt from pdfminer

    http://www.unixuser.org/~euske/python/pdfminer/

    Unfortunately pdf2txt is extremely hard to make it work inside a
    pipeline as it asks for seekable output.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ottavio Caruso@21:1/5 to Chris Elvidge on Fri Apr 22 10:55:04 2022
    On 21/04/2022 17:59, Chris Elvidge wrote:
    On 21/04/2022 15:47, Ottavio Caruso wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:


    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    ?

    and 2)

    Any way to clean up unprintable characters before sending them to xed?


    without using less : pdftotext?


    pdftotext is horrible. It doesn't remove odd characters and messes up
    with formatting.

    --
    Ottavio Caruso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ottavio Caruso@21:1/5 to Javier on Fri Apr 22 10:59:16 2022
    On 22/04/2022 01:53, Javier wrote:
    Ottavio Caruso <[email protected]> wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage.

    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:

    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    There is vipe from the moreutils package to edit/view in an external
    program the content of stdout inside a pipeline.

    https://joeyh.name/code/moreutils/

    $ less file.pdf | EDITOR=xed vipe

    and 2)

    Any way to clean up unprintable characters before sending them to xed?

    For that you have GNU strings.

    $ less file.pdf | strings | EDITOR=xed vipe


    Fantastic. This one does the job. Thanks!


    --
    Ottavio Caruso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ottavio Caruso on Fri Apr 22 14:36:34 2022
    On 22.04.2022 11:55, Ottavio Caruso wrote:
    On 21/04/2022 17:59, Chris Elvidge wrote:
    On 21/04/2022 15:47, Ottavio Caruso wrote:
    I can view a simplified version of a pdf file with less:

    $ less file.pdf

    If I press "v" an editor will open but it will show all the pdf garbage. >>>
    I would like to redirect the formatted output from less to a text
    editor, for example xed or pluma.

    1)Is there a more elegant way than:


    $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

    ?

    and 2)

    Any way to clean up unprintable characters before sending them to xed?


    without using less : pdftotext?


    pdftotext is horrible. It doesn't remove odd characters and messes up
    with formatting.


    Interesting. - I just tried it on a letter written in German and one
    written in Greek - with certainly a lot of "odd" characters (from an
    ASCII point of view) -; both perfectly readable. - I'm curious what
    the characters are that mess up the formatting in your environment.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)