Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Redirect output from less to text editor

7 views
Skip to first unread message

Spiros Bousbouras

unread,
Apr 21, 2022, 11:07:17 AM4/21/22
to
On Thu, 21 Apr 2022 15:47:32 +0100
Ottavio Caruso <ottavio2006...@yahoo.com> wrote:
> I can view a simplified version of a pdf file with less:
>
> $ less file.pdf
>
> If I press "v" an editor will open but it will show all the pdf garbage.
>
> I would like to redirect the formatted output from less to a text
> editor, for example xed or pluma.
>
> 1)Is there a more elegant way than:
>
>
> $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt
>
> ?

If the editors in question support an option to read from stdin then you can
use that through a pipe. Othherwise I can't think of anything. Beyond that ,
I don't think that less really does any formatting , it just wraps lines
and perhaps a bit more. A decent text editor should be able to do this on its
own so I'm not clear why you want to involve less .Note that there is also
the fmt utility to wrap lines.

> and 2)
>
> Any way to clean up unprintable characters before sending them to xed?

As long as you have a clear idea what the unprintable characters are , then
you can use sed .For example
sed -e 's/\o000//g' -e 's/\o001//g' file.pdf

will omit octets with value 0 or 1. I think the \o000 syntax is GNU
specific.

--
vlaho.ninja/prog

Chris Elvidge

unread,
Apr 21, 2022, 12:59:56 PM4/21/22
to
On 21/04/2022 15:47, Ottavio Caruso wrote:
> I can view a simplified version of a pdf file with less:
>
> $ less file.pdf
>
> If I press "v" an editor will open but it will show all the pdf garbage.
>
> I would like to redirect the formatted output from less to a text
> editor, for example xed or pluma.
>
> 1)Is there a more elegant way than:
>
>
> $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt
>
> ?
>
> and 2)
>
> Any way to clean up unprintable characters before sending them to xed?
>

without using less : pdftotext?

--
Chris Elvidge
England

Eli the Bearded

unread,
Apr 21, 2022, 1:55:10 PM4/21/22
to
In comp.unix.shell, Ottavio Caruso <ottavio2006...@yahoo.com> wrote:
> I can view a simplified version of a pdf file with less:
>
> $ less file.pdf
>
> If I press "v" an editor will open but it will
> show all the pdf garbage.

pdftotext is the typical tool for turning a PDF into text (but it
doesn't OCR images, so it's not always an ideal tool). It sounds like
you are getting a 'strings' like output from less. That's probably less
useful, unless you really do want the comments in the PDF.

Elijah
------
"views" binary files in vim sometimes

Keith Thompson

unread,
Apr 21, 2022, 2:07:36 PM4/21/22
to
Ottavio Caruso <ottavio2006...@yahoo.com> writes:
> I can view a simplified version of a pdf file with less:
>
> $ less file.pdf

If that works (it doesn't for me) then you probably have less configured
to use an input preprocessor via $LESSOPEN. "man less" for details.

> If I press "v" an editor will open but it will show all the pdf garbage.
>
> I would like to redirect the formatted output from less to a text
> editor, for example xed or pluma.
>
> 1)Is there a more elegant way than:
>
>
> $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt
>
> ?
>
> and 2)
>
> Any way to clean up unprintable characters before sending them to xed?

You can probably use the same filter used by LESSOPEN.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Javier

unread,
Apr 21, 2022, 8:53:59 PM4/21/22
to
Ottavio Caruso <ottavio2006...@yahoo.com> wrote:
> I can view a simplified version of a pdf file with less:
>
> $ less file.pdf
>
> If I press "v" an editor will open but it will show all the pdf garbage.
>
> I would like to redirect the formatted output from less to a text
> editor, for example xed or pluma.
>
> 1)Is there a more elegant way than:
>
> $ less file.pdf > /tmp/file.txt && xed /tmp/file.txt

There is vipe from the moreutils package to edit/view in an external
program the content of stdout inside a pipeline.

https://joeyh.name/code/moreutils/

$ less file.pdf | EDITOR=xed vipe

> and 2)
>
> Any way to clean up unprintable characters before sending them to xed?

For that you have GNU strings.

$ less file.pdf | strings | EDITOR=xed vipe

But being pdf files I would rather use dedicated tools like pdftotext
from poppler or pdf2txt from pdfminer

http://www.unixuser.org/~euske/python/pdfminer/

lesspipe.sh (as Keith Thompson suggested to look at) uses pdftotext for
pdf files.

Javier

unread,
Apr 21, 2022, 9:25:52 PM4/21/22
to
Javier <inv...@invalid.invalid> wrote:
> or pdf2txt from pdfminer
>
> http://www.unixuser.org/~euske/python/pdfminer/

Unfortunately pdf2txt is extremely hard to make it work inside a
pipeline as it asks for seekable output.

Janis Papanagnou

unread,
Apr 22, 2022, 8:36:40 AM4/22/22
to
On 22.04.2022 11:55, Ottavio Caruso wrote:
> pdftotext is horrible. It doesn't remove odd characters and messes up
> with formatting.
>

Interesting. - I just tried it on a letter written in German and one
written in Greek - with certainly a lot of "odd" characters (from an
ASCII point of view) -; both perfectly readable. - I'm curious what
the characters are that mess up the formatting in your environment.

Janis

0 new messages