Forum: >>> Magnum BBS <<<

Needed tool for vision-impaired - was [Re: PDF Editor for Debian]

From Richard Owlett@21:1/5 to Richard on Mon Jun 24 19:20:01 2024

On 06/24/2024 12:35 AM, Richard wrote:

Hello,
this very much depends on what you are expecting it to do. In general, PDFs are only meant to be viewed - and printed - they where never meant for anything else. ...

Second sentence should read:

... only meant to be viewed by those with *NORMAL* vision ...

I'm attempting to read a USDA document.[1]
The printed version of this document is marginally readable.

Tools such as "Atril Document Viewer" provide selected magnification.
For this particular document and monitor, 150% is comfortable. Requires re-positioning the viewpoint 500 to 600 times to read document.

For _this_ document, Atril can select all the text on a page in a manner
that can be pasted in a "reasonable" manner to a Pluma document.

It will:
a. ignore actual graphics.
b. put title/headings/??? on a separate line.
c. all text between full page-width title/headings/??? will be
treated as a logical unit.
It will not:
1. put a blank line between paragraphs.
2. put a blank line above/below lines containing title/headings/???.
3. identify superscripts in some manner.

All this suggests that it should be able to extract text from a PDF and
create a HTML document likely using only , , , and <li> in
its <body>.

[1] https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
_Thrifty Food Plan, 2021_
Food and Nutrition Service
August 2021
FNS-916

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Nicolas George@21:1/5 to All on Mon Jun 24 19:30:01 2024

Karen Lewellen (12024-06-24):

Good afternoon.
I am providing another option that might help here.
robobraille,

www.robobraille.org
Provides services, free of charge, that will convert pdf files to a number of different formats, including .html
They provide audio, mobi, and convert epub files too..but I digress.
As a test, consider sending your file to
convert at robobraille.org
correctly of course.
in the subjectline put html
leaving the body blank, and attach the file.
See if the .html file returned meets your needs.

Interesting.

Do you know how they fare with math? I mean real, non-trivial formulas
produced by LaTeX like you would find in
https://arxiv.org/abs/1803.05929 ?

(I know, I could test. I will if you do not know the answer.)

Regards,

--
Nicolas George

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Karen Lewellen@21:1/5 to Richard Owlett on Mon Jun 24 19:30:01 2024

Good afternoon.
I am providing another option that might help here.
robobraille,

www.robobraille.org
Provides services, free of charge, that will convert pdf files to a
number of different formats, including .html
They provide audio, mobi, and convert epub files too..but I digress.
As a test, consider sending your file to
convert at robobraille.org
correctly of course.
in the subjectline put html
leaving the body blank, and attach the file.
See if the .html file returned meets your needs.
Best,
Karen

On Mon, 24 Jun 2024, Richard Owlett wrote:

On 06/24/2024 12:35 AM, Richard wrote:

Hello,
this very much depends on what you are expecting it to do. In general,
PDFs
are only meant to be viewed - and printed - they where never meant for
anything else. ...

Second sentence should read:

... only meant to be viewed by those with *NORMAL* vision ...

I'm attempting to read a USDA document.[1]
The printed version of this document is marginally readable.

Tools such as "Atril Document Viewer" provide selected magnification.
For this particular document and monitor, 150% is comfortable. Requires re-positioning the viewpoint 500 to 600 times to read document.

For _this_ document, Atril can select all the text on a page in a manner that can be pasted in a "reasonable" manner to a Pluma document.

It will:
a. ignore actual graphics.
b. put title/headings/??? on a separate line.
c. all text between full page-width title/headings/??? will be
treated as a logical unit.
It will not:
1. put a blank line between paragraphs.
2. put a blank line above/below lines containing title/headings/???.
3. identify superscripts in some manner.

All this suggests that it should be able to extract text from a PDF and create a HTML document likely using only , , , and <li> in its <body>.

[1] https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
_Thrifty Food Plan, 2021_
Food and Nutrition Service
August 2021
FNS-916

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Karen Lewellen on Wed Aug 7 14:50:02 2024

On 06/24/2024 12:22 PM, Karen Lewellen wrote:

Good afternoon.
I am providing another option that might help here.
robobraille,

www.robobraille.org
Provides services, free of charge, that will convert pdf files to a
number of different formats, including .html
They provide audio, mobi, and convert epub files too..but I digress.
As a test, consider sending your file to
convert at robobraille.org
correctly of course.
in the subjectline put html
leaving the body blank, and attach the file.
See if the .html file returned meets your needs.
Best,
Karen

I went to the site shortly after you posted.
*MY* browser (SeaMonkey 2.49.4 {32 bit Linux}) choked on it.
I didn't get a chance to visit local library to try another browser.
Forgot I had a copy of Firefox 68.10.0esr on my machine.
It ran fine.

I converted "https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf" to both text and HTML.
The text version seems perfect.
The HTML version has problem of missing titles to several tables near
end of file. There are 15 tables one after another. All table *contents*
came thru OK. Only the last one had its associated title.

I'll give www.robobraille.org a heads-up about it.
As I've a peculiar local configuration of SeaMonkey, could another SM
user run a quick check so that I can report if their site has a problem
with SeaMonkey?

TIA

On Mon, 24 Jun 2024, Richard Owlett wrote:

On 06/24/2024 12:35 AM, Richard wrote:

Hello,
this very much depends on what you are expecting it to do. In general, >>> PDFs
are only meant to be viewed - and printed - they where never meant for >>> anything else. ...

Second sentence should read:

... only meant to be viewed by those with *NORMAL* vision ...

I'm attempting to read a USDA document.[1]
The printed version of this document is marginally readable.

Tools such as "Atril Document Viewer" provide selected magnification.
For this particular document and monitor, 150% is comfortable.
Requires re-positioning the viewpoint 500 to 600 times to read document.

For _this_ document, Atril can select all the text on a page in a
manner that can be pasted in a "reasonable" manner to a Pluma document.

It will:
 a. ignore actual graphics.
 b. put title/headings/??? on a separate line.
 c. all text between full page-width title/headings/??? will be
 treated as a logical unit.
It will not:
 1. put a blank line between paragraphs.
 2. put a blank line above/below lines containing title/headings/???.
 3. identify superscripts in some manner.

All this suggests that it should be able to extract text from a PDF
and create a HTML document likely using only , , , and
<li> in its <body>.

[1]
https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf

 _Thrifty Food Plan, 2021_
 Food and Nutrition Service
 August 2021
 FNS-916

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Felix Miata@21:1/5 to All on Wed Aug 7 19:50:02 2024

Richard Owlett composed on 2024-08-07 07:45 (UTC-0500):

I went to the site shortly after you posted.
*MY* browser (SeaMonkey 2.49.4 {32 bit Linux}) choked on it.
I didn't get a chance to visit local library to try another browser.
Forgot I had a copy of Firefox 68.10.0esr on my machine.
It ran fine.

I converted "https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf"
to both text and HTML.
The text version seems perfect.
The HTML version has problem of missing titles to several tables near
end of file. There are 15 tables one after another. All table *contents*
came thru OK. Only the last one had its associated title.

I'll give www.robobraille.org a heads-up about it.
As I've a peculiar local configuration of SeaMonkey, could another SM
user run a quick check so that I can report if their site has a problem
with SeaMonkey?

The PDF loads fine here in current SeaMonkey 2.53.18.2 64bit. I haven't used 2.49.x in five or so years. I use the static build hosted on http://archive.seamonkey-project.org/releases/ .
--
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Felix Miata on Wed Aug 7 20:40:01 2024

On 08/07/2024 12:44 PM, Felix Miata wrote:

Richard Owlett composed on 2024-08-07 07:45 (UTC-0500):

I went to the site shortly after you posted.
*MY* browser (SeaMonkey 2.49.4 {32 bit Linux}) choked on it.
I didn't get a chance to visit local library to try another browser.
Forgot I had a copy of Firefox 68.10.0esr on my machine.
It ran fine.

I converted
"https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf"
to both text and HTML.
The text version seems perfect.
The HTML version has problem of missing titles to several tables near
end of file. There are 15 tables one after another. All table *contents*
came thru OK. Only the last one had its associated title.

I'll give www.robobraille.org a heads-up about it.
As I've a peculiar local configuration of SeaMonkey, could another SM
user run a quick check so that I can report if their site has a problem
with SeaMonkey?

The PDF loads fine here in current SeaMonkey 2.53.18.2 64bit. I haven't used 2.49.x in five or so years. I use the static build hosted on http://archive.seamonkey-project.org/releases/ .

Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Nicolas George on Thu Aug 8 14:20:02 2024

On 06/24/2024 12:29 PM, Nicolas George wrote:

Karen Lewellen (12024-06-24):

Good afternoon.
I am providing another option that might help here.
robobraille,

www.robobraille.org
Provides services, free of charge, that will convert pdf files to a number >> of different formats, including .html
They provide audio, mobi, and convert epub files too..but I digress.
As a test, consider sending your file to
convert at robobraille.org
correctly of course.
in the subjectline put html
leaving the body blank, and attach the file.
See if the .html file returned meets your needs.

Interesting.

Do you know how they fare with math? I mean real, non-trivial formulas produced by LaTeX like you would find in
https://arxiv.org/abs/1803.05929 ?

(I know, I could test. I will if you do not know the answer.)

Regards,

While looking for something else I found https://www.robobraille.org/resources/software-and-tools/#math

Relevant &/or useful?
HTH

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	56:17:01
Calls:	12,446
Calls today:	1
Files:	15,192
Messages:	6,537,360

Needed tool for vision-impaired - was [Re: PDF Editor for Debian]

Who's Online

Recent Visitors

System Info