OCR PDF

Make scanned PDFs searchable. Or extract their text.

Drop files anywhere or click to browse

One PDF at a time

Run OCR on a scanned PDF to make its text selectable and searchable. Drop a PDF, pick a language, and get back either a searchable PDF (same look as the original, with an invisible text layer over each page) or a plain text file. All processing happens in your browser via Tesseract.js — no upload.

How it works

4-step walkthrough

  1. 1

    Drop a scanned PDF

    Works on any PDF, but the value-add is for scanned documents whose text isn't already searchable. Born-digital PDFs (exported from Word / Google Docs / etc.) already have selectable text and don't need OCR.

  2. 2

    Pick output mode and language

    "Searchable PDF" produces an output that looks identical to the input but where every detected word is overlaid as invisible text — your reader's Find feature now works, and you can copy-paste. "Plain text" gives you a .txt file with page-marker headers. Pick the language closest to the document content for best accuracy; Tesseract's English model is the default.

  3. 3

    Watch the per-page progress

    Dropvert renders each page at 2× density (better OCR accuracy), passes the image to Tesseract.js, and collects word-level bounding boxes. A scanned 10-page doc typically takes 60–90 seconds on average hardware. The browser tab stays responsive.

  4. 4

    Download

    Searchable PDF or .txt depending on what you picked. The output goes through your normal browser download flow.

Why use Dropvert

Local-first, free, no upload required

  • Browser-side OCR — your scanned contracts, statements, and personal documents never get uploaded.
  • Two output modes for different needs: searchable PDF for archival use, plain text for quick extraction.
  • 13 language presets covering English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (simplified + traditional), Japanese, Korean, Arabic.
  • Same OCR engine (Tesseract) used by the existing Image to Text tool — proven model with 99%+ accuracy on clean printed text.
  • No watermark, no signup.

Frequently asked questions

7 answered

Will the output PDF look exactly like the input?
Yes for the searchable-PDF mode. We don't re-rasterize the original pages — we add an invisible text layer on top, leaving the original visible content untouched. The file size grows slightly (text layer + font subset embedded).
How accurate is the OCR?
Tesseract is 95-99% accurate on clean printed English at 150+ DPI. Lower DPI scans (under 100 DPI), heavily skewed/rotated pages, handwriting, decorative fonts, and very small text all reduce accuracy. For best results, scan source documents at 300 DPI in black-and-white or grayscale.
How big a PDF can I OCR?
20-30 pages is comfortable. Beyond that, expect multi-minute processing times. Tesseract.js plus our 2× render scale puts pressure on browser memory; very large scanned documents (100+ pages) may need to be split first.
Why is the searchable PDF text invisible?
Standard OCR'd-PDF technique: render the original page image, lay an invisible (opacity 0) text layer on top with each word at its detected bounding-box position. The reader's Find function and copy-paste both work against the invisible layer; the visible page still looks like the scan. This is how Adobe's OCR, ABBYY FineReader, and most other OCR tools structure their output.
Can I OCR a PDF that already has selectable text?
You can — but it's pointless. Born-digital PDFs already have a proper text layer; running OCR on top would duplicate the text imperfectly. Use the existing PDF reader's Find function instead.
Are my files uploaded?
No. The PDF, the OCR engine, and the output all stay in your browser. Tesseract.js downloads its language model (a few MB per language) from a CDN on first use; that's the only network request.
Why didn't my Spanish/French text get recognized?
You probably left the language as English. Tesseract's accuracy drops sharply when the language doesn't match. Switch to the right language in the dropdown — the language model downloads once and caches, so repeat runs are fast.

Related tools

4 suggestions