OCR PDF
Make scanned PDFs searchable. Or extract their text.
Run OCR on a scanned PDF to make its text selectable and searchable. Drop a PDF, pick a language, and get back either a searchable PDF (same look as the original, with an invisible text layer over each page) or a plain text file. All processing happens in your browser via Tesseract.js — no upload.
How it works
4-step walkthrough
How it works
4-step walkthrough
- 1
Drop a scanned PDF
Works on any PDF, but the value-add is for scanned documents whose text isn't already searchable. Born-digital PDFs (exported from Word / Google Docs / etc.) already have selectable text and don't need OCR.
- 2
Pick output mode and language
"Searchable PDF" produces an output that looks identical to the input but where every detected word is overlaid as invisible text — your reader's Find feature now works, and you can copy-paste. "Plain text" gives you a .txt file with page-marker headers. Pick the language closest to the document content for best accuracy; Tesseract's English model is the default.
- 3
Watch the per-page progress
Dropvert renders each page at 2× density (better OCR accuracy), passes the image to Tesseract.js, and collects word-level bounding boxes. A scanned 10-page doc typically takes 60–90 seconds on average hardware. The browser tab stays responsive.
- 4
Download
Searchable PDF or .txt depending on what you picked. The output goes through your normal browser download flow.
Why use Dropvert
Local-first, free, no upload required
Why use Dropvert
Local-first, free, no upload required
- Browser-side OCR — your scanned contracts, statements, and personal documents never get uploaded.
- Two output modes for different needs: searchable PDF for archival use, plain text for quick extraction.
- 13 language presets covering English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (simplified + traditional), Japanese, Korean, Arabic.
- Same OCR engine (Tesseract) used by the existing Image to Text tool — proven model with 99%+ accuracy on clean printed text.
- No watermark, no signup.
Frequently asked questions
7 answered
Frequently asked questions
7 answered
- Will the output PDF look exactly like the input?
- Yes for the searchable-PDF mode. We don't re-rasterize the original pages — we add an invisible text layer on top, leaving the original visible content untouched. The file size grows slightly (text layer + font subset embedded).
- How accurate is the OCR?
- Tesseract is 95-99% accurate on clean printed English at 150+ DPI. Lower DPI scans (under 100 DPI), heavily skewed/rotated pages, handwriting, decorative fonts, and very small text all reduce accuracy. For best results, scan source documents at 300 DPI in black-and-white or grayscale.
- How big a PDF can I OCR?
- 20-30 pages is comfortable. Beyond that, expect multi-minute processing times. Tesseract.js plus our 2× render scale puts pressure on browser memory; very large scanned documents (100+ pages) may need to be split first.
- Why is the searchable PDF text invisible?
- Standard OCR'd-PDF technique: render the original page image, lay an invisible (opacity 0) text layer on top with each word at its detected bounding-box position. The reader's Find function and copy-paste both work against the invisible layer; the visible page still looks like the scan. This is how Adobe's OCR, ABBYY FineReader, and most other OCR tools structure their output.
- Can I OCR a PDF that already has selectable text?
- You can — but it's pointless. Born-digital PDFs already have a proper text layer; running OCR on top would duplicate the text imperfectly. Use the existing PDF reader's Find function instead.
- Are my files uploaded?
- No. The PDF, the OCR engine, and the output all stay in your browser. Tesseract.js downloads its language model (a few MB per language) from a CDN on first use; that's the only network request.
- Why didn't my Spanish/French text get recognized?
- You probably left the language as English. Tesseract's accuracy drops sharply when the language doesn't match. Switch to the right language in the dropdown — the language model downloads once and caches, so repeat runs are fast.
Related tools
4 suggestions
Related tools
4 suggestions