On 7/25/2022 2:19 PM, pinnerite wrote:
Since 2010 I have been scanning documents and creating pdfs from them
before shredding them.
Most of them were scanned by a Fujitsu cut-sheet feeder.
I had to use Fujitsu's software but that only runs under Windows XP.
I have had too retain a virtual XP ever since.
Today I had to trace back through the pdfs to uncover some information.
Some I could but others refused to open with the Mint Document Viewer.
Naturally I turned to Google. Most suggestions were useless but I hoped
that xpdf mught turn out to be a winner. Not in Mint's repository. :(
Next I tried opening one file in LibreOffice writer and it did! :)
But it was the only one. :(
Is there anyone that has cracked this?
TIA
Do you have backup copies of the file system in question ?
Does the file system throw errors, when this part of the disk is accessed ?
Run the PDF tool in Terminal. Does the PDF tool throw
errors when reading the file ?
Before we had PDF, there was PostScript. It had little in the way
of internal protections. The parser could stop on a damaged structure
and throw a weird error, and you might not be able to guess what
had happened.
The PDF format consists of "Objects", and everything has byte counts.
So at some level, parsing could stop because "this object has too many
bytes or not enough bytes". However, the actual benefit of this
feature is questionable. Because it does not allow us to actually
repair a document.
When we scan to PDF, the PDF adds little in the way of value.
The wrapper is pretty small, consisting of a set of objects that
would be very similar from page to page.
The majority of a page is a JPG or a TIFF, the scanned image.
The scanned image can be at a higher resolution than is
absolutely necessary.
The PDF will say "place this oversize image on an A4", so we
might have a default scale to use when printing.
But once you strip off the auxiliary structures, it's
really just an image format underneath.
Image formats that are compressed, if a compressed bit gets
corrupted, there is "error multiplication". It may be difficult
for a human to even notice a chunk of the image is missing,
because it's binary looking. But with the byte counter,
there is some slim chance we can notice the underlying
image is "short a few bytes".
*******
I recommend this tool, available in quite a few distros.
It can even do things, like convert a binary PDF into a text
PDF (all except for one line which remains binary). Such transformations
would only work if the file was corruption free (of course). I
sometimes convert files from binary representation to a text
representation, to study the objects in them easier.
https://mupdf.com/docs/mutool.html
mutool info some.pdf
mutool convert -F pdf -O decompress,clean -o null.pdf native_plant_id.pdf
Ghostscript has also been changed. They wrote the PDF interpreter
in C for speed. But the nature of PDF does not particularly allow
improvements in the handling, so I have my doubts that this
new interpret will help with corruption issues.
https://ghostscript.com/blog/pdfi.html
*******
Use the file command, and see if it says the file is a PDF.
file some.pdf # Does it say PDF, or is it an entirely different type ???
Paul
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)