OpenPDFTools

Scanned PDF Text Is Not Selectable - How to Fix It with OCR

Martin PavličUpdated April 8, 20266 min read
Share
Scanned PDF Text Is Not Selectable - How to Fix It with OCR

Why can’t you select text in a scanned PDF?

When a document is physically scanned and saved as a PDF, the scanner captures a flat image of the page - just like taking a photo. The resulting file contains no actual text characters, only pixels arranged to look like letters. That’s why clicking anywhere in the document selects nothing: there’s no text layer for your cursor to grab.

This is one of the most common PDF frustrations. The document looks perfectly readable on screen, but it’s essentially a photograph embedded in a PDF wrapper. You can’t search it, copy from it, or let a screen reader parse it.

What is OCR and how does it fix this?

OCR (Optical Character Recognition) is a technology that analyzes the image of text and converts it into actual, machine-readable characters. The software looks at the shapes of letters, compares them to known patterns, and outputs a text layer that gets embedded back into the PDF.

After OCR processing, you get a searchable, selectable PDF - visually identical to the original, but now your cursor can highlight words, Ctrl+F can find phrases, and copy-paste works normally. Screen readers and accessibility tools can read it too.

How to apply OCR to a scanned PDF

There are several ways to add OCR to a scanned PDF, ranging from free browser tools to desktop software:

  • Convert to Word, then save as PDF: Our PDF to Word converter extracts the visual content and converts it into an editable Word document. Once in Word, the text is fully selectable. You can then re-export to PDF with a proper text layer.
  • Adobe Acrobat (paid): The industry-standard tool. Open the PDF, go to Tools → Scan & OCR → Recognize Text, and Acrobat adds a text layer directly. Expensive but highly accurate.
  • Google Drive (free): Upload your scanned PDF to Google Drive, right-click it, and choose "Open with Google Docs." Google automatically runs OCR and opens the text in a Docs document. Works surprisingly well for clean scans.
  • Tesseract OCR (free, open-source): A powerful command-line OCR engine used by many apps. Best for developers or technical users who want a free self-hosted solution.
  • Adobe Acrobat online (limited free): Adobe offers limited free OCR processing through their online tools for users without a subscription.

Tips for better OCR accuracy

OCR quality depends heavily on the quality of the original scan. Follow these tips to maximize accuracy:

  • Scan at 300 DPI or higher: Lower resolution scans produce blurry characters that OCR engines misread. 300 DPI is the minimum; 600 DPI is ideal for small text or detailed documents.
  • Use black-and-white for text documents: Color scans add file size without improving OCR accuracy for plain text. Black-and-white or grayscale is sufficient for most documents.
  • Keep pages straight: Tilted or skewed pages confuse OCR software. Most modern tools can auto-deskew, but starting straight helps.
  • Avoid coffee stains and smudges: Physical marks on the document get misread as characters. Clean the original if possible.
  • Check the output: OCR is not 100% accurate. Always proofread the result, especially for numbers, punctuation, and handwritten sections.

After OCR: reduce file size if needed

OCR processing can sometimes increase PDF file size because it adds a hidden text layer on top of the existing image layer. If your resulting file is too large, use our PDF compressor to reduce the size without losing visual quality.

What if OCR doesn’t recognize the text correctly?

OCR accuracy depends on the original scan quality. Poor results are common with: very small fonts (below 8pt), handwritten text, decorative or unusual fonts, faded ink, or low-quality scans below 200 DPI. In these cases, manual retyping may be necessary - or rescanning the original document at higher quality before running OCR again.

Frequently Asked Questions

Why is text in my PDF not selectable?
Your PDF is a scanned image, not a document with an embedded text layer. When a paper document is scanned and saved as PDF, the result is essentially a photograph of the page - pixels shaped like letters, but not actual text. To make it selectable, you need to run OCR (Optical Character Recognition) to add a text layer.
Is there a free way to make scanned PDF text selectable?
Yes - Google Drive offers free OCR: upload the PDF, right-click it, and open with Google Docs. Google automatically recognizes the text. Alternatively, our PDF to Word converter extracts the content into an editable document. For a fully free desktop solution, Tesseract OCR is open-source and very powerful.
Does OCR change how the PDF looks?
No - OCR adds an invisible text layer behind the existing image. The visual appearance of the PDF stays exactly the same. You just gain the ability to select, copy, and search the text. The only exception is if you convert to Word and re-export, where minor formatting changes may occur.
How accurate is OCR on scanned documents?
Modern OCR is 95-99% accurate on clean, well-scanned documents at 300 DPI or higher. Accuracy drops significantly for low-resolution scans, handwriting, unusual fonts, or pages with physical damage. Always proofread the output before relying on it for important documents.
Can I make a handwritten PDF text selectable with OCR?
Standard OCR works poorly on handwriting - it’s designed for printed, typed text. Specialized handwriting recognition tools exist but are far less accurate than printed-text OCR. If the document has mixed print and handwriting, OCR will correctly recognize the printed parts but likely misread the handwritten sections.

Related Articles