Question 1

Will the output PDF look exactly like the input?

Accepted Answer

Yes for the searchable-PDF mode. We don't re-rasterize the original pages — we add an invisible text layer on top, leaving the original visible content untouched. The file size grows slightly (text layer + font subset embedded).

Question 2

How accurate is the OCR?

Accepted Answer

Tesseract is 95-99% accurate on clean printed English at 150+ DPI. Lower DPI scans (under 100 DPI), heavily skewed/rotated pages, handwriting, decorative fonts, and very small text all reduce accuracy. For best results, scan source documents at 300 DPI in black-and-white or grayscale.

Question 3

How big a PDF can I OCR?

Accepted Answer

20-30 pages is comfortable. Beyond that, expect multi-minute processing times. Tesseract.js plus our 2× render scale puts pressure on browser memory; very large scanned documents (100+ pages) may need to be split first.

Question 4

Why is the searchable PDF text invisible?

Accepted Answer

Standard OCR'd-PDF technique: render the original page image, lay an invisible (opacity 0) text layer on top with each word at its detected bounding-box position. The reader's Find function and copy-paste both work against the invisible layer; the visible page still looks like the scan. This is how Adobe's OCR, ABBYY FineReader, and most other OCR tools structure their output.

Question 5

Can I OCR a PDF that already has selectable text?

Accepted Answer

You can — but it's pointless. Born-digital PDFs already have a proper text layer; running OCR on top would duplicate the text imperfectly. Use the existing PDF reader's Find function instead.

Question 6

Are my files uploaded?

Accepted Answer

No. The PDF, the OCR engine, and the output all stay in your browser. Tesseract.js downloads its language model (a few MB per language) from a CDN on first use; that's the only network request.

Question 7

Why didn't my Spanish/French text get recognized?

Accepted Answer

You probably left the language as English. Tesseract's accuracy drops sharply when the language doesn't match. Switch to the right language in the dropdown — the language model downloads once and caches, so repeat runs are fast.

OCR PDF

How it works

Why use Dropvert

Frequently asked questions

Related tools