Scanned PDF to Formatted MS Word

Started by Syphon, October 06, 2011, 01:54:18 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Syphon

I know I am going to get laughed off the forums for asking this question.

Here is the problem:
A client gave us a hard copy 200+ page manual. They want us to scan it to PDF (I already have done) and then convert it to a completed formatted word file. In other words, the client does not want to set up (typeset and format) the manual, they want it to be done without any effort and be perfect.

So the question is:
Is there a way of doing this?
To my knowledge there is no application that converts a hard copy into a already formatted file that is ready to go.
Freelance Designer | Illustrator | Photo Editor
iMac • Mac OSX 10.15 Catalina
Affinity Publisher • Affinity Photo • Affinity Designer
Adobe InDesign • Adobe Photoshop • Adobe Illustrator
Adobe Acrobat

born2print

OCR would get you started, but there is no magic pixel-to-rich-text software that I know of.
And I'm not laughing! :goodpost:
My lips are moving and the sound's coming out
The words are audible but I have my doubts
That you realize what has been said

Ear

Ya, no laughing, bro. This is a seriously F'd up request they make of you.

As b2p stated, OCR would get you started but damn, this project sounds like a daily migraine. Envy you I do not.  :death:
"... profile says he's a seven-foot tall ex-basketball pro, Hindu guru drag queen alien." ~Jet Black

t-pat

in acrobat x there is "File...save as... Microsoft Word" I shit you not.
vdp donkey
gmc inspire • sarcasm while you wait

born2print

No doubt an "embed in" more than a "save as" I would bet.
My lips are moving and the sound's coming out
The words are audible but I have my doubts
That you realize what has been said

frailer

Sometimes cruising threads pays off. My wife sends me a TIFF of a guy's CV that they need to be editable. (Part of a submission her company's making). When I get this thing, it's ... groan...
But, armed with the knowledge on this thread>> Open in PhSh, save as PDF > Acrobat > OCR. A bit rooted up in the text, but.. editable!  For several milliseconds I felt like some sort of pp nerd. Then the feeling faded.
Forgotten good guys: Dennis Ritchie, Burrell Smith, Bill Atkinson, Richard Stallman
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now just an honorary member.

David

You will need to re-scan with OCR to be able to get any text at all out of that.

And after that, you will need to go through the pages to correct any OCR errors.
Is the type on the pages something simple, like Times Roman? Any charts or graphs?
Some type does OCR better than others (helvetica and times work best, cursive or fancy fonts do not), charts and graphs don't do OCR at all.

good luck.

Prepress guy - Retired - Working from home
Livin' la Vida Loca

frailer

I ran my eye over it quickly, and shot it back to her. She's aware it needs tweaking against the orig. But, SWMBO seems happy. Can't ask for more than that.  :cheesy:
Forgotten good guys: Dennis Ritchie, Burrell Smith, Bill Atkinson, Richard Stallman
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now just an honorary member.

DigiCorn

Quote from: t-pat on October 06, 2011, 02:01:54 PMin acrobat x there is "File...save as... Microsoft Word" I shit you not.
There's a "Save for Microsoft Office" in Illustrator too.
"There's been a lot of research recently on how hard it is to dislodge an impression once it's been implanted in someone's mind. (This is why political attack ads don't have to be true to be effective. The other side can point out their inaccuracies, but the voter's mind privileges the memory of the original accusation, which was juicier than any counterargument ever could be.)"
― Johnny Carson

"Selling my soul would be a lot easier if I could just find it."
– Nikki Sixx

"Always do sober what you said you'd do drunk. That will teach you to keep your mouth shut."
― Ernest Hemingway

Syphon

Quote from: t-pat on October 06, 2011, 02:01:54 PMin acrobat x there is "File...save as... Microsoft Word" I shit you not.

I already tried the Export to Word and Rich Text (in Acrobat Pro 9), it would need a LOT of work.
But I should say that they want to edit the file but as I said they don't want to go thru formatting and setting it up which I believe they are going to have to do. This should teach them to not lose the original file.
Freelance Designer | Illustrator | Photo Editor
iMac • Mac OSX 10.15 Catalina
Affinity Publisher • Affinity Photo • Affinity Designer
Adobe InDesign • Adobe Photoshop • Adobe Illustrator
Adobe Acrobat

Joe

No easy button on this. They know they can't "do it without any effort and be perfect" but they expect you to.

May the bird of paradise shit on them.
Mac OS Sonoma 14.2.1 (c) | (retired)

The seven ages of man: spills, drills, thrills, bills, ills, pills and wills.

Farabomb

Shit may me involved but I don't think it's on the customer right now.
Speed doesn't kill, rapidly becoming stationary is the problem

I'd rather have stories told than be telling stories of what I could have done.

Quote from: Ear on April 06, 2016, 11:54:16 AM
Quote from: Farabomb on April 06, 2016, 11:39:41 AMIt's more like grip, grip, grip, noise, then spin and 2 feet in and feel shame.
I once knew a plus-sized girl and this pretty much describes teh secks. :rotf:
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.
         —Benjamin Franklin

My other job

Syphon

Here is a update.
The boss had me export the OCR text out of the pdf to a word file and send them that.
They just going to have to deal with what we can give them.
Freelance Designer | Illustrator | Photo Editor
iMac • Mac OSX 10.15 Catalina
Affinity Publisher • Affinity Photo • Affinity Designer
Adobe InDesign • Adobe Photoshop • Adobe Illustrator
Adobe Acrobat