Tesseract.js Browser OCR Complete Tutorial

Introduction

Tesseract.js is a pure JavaScript port of the Tesseract OCR engine, compiled to WebAssembly so it runs entirely in a browser tab or a Node.js process. No server round trips, no upload dialogs, no cloud bills — the image goes straight into a WebWorker and the recognized text comes back as a plain object. The project wraps the classic Tesseract 5 engine that Google has maintained for over a decade, so accuracy matches the C++ original for Latin scripts and handles more than 100 languages when the matching traineddata files are loaded.

The browser-native angle opens doors that server OCR cannot. A recipe app can scan a grocery receipt the moment a user snaps a photo, without ever sending the image off-device. A classroom tool can digitize a handwritten worksheet for a student who has no network access. An accessibility extension can extract text from screenshots and pipe it into a screen reader. Each case benefits from keeping the pixels local — the privacy story is simple, the latency drops to wall-clock, and the marginal cost per recognition is zero.

This tutorial targets Tesseract.js 7.0.0, the current stable release on npm. Every example runs against the modern Worker API (createWorker), which replaces the legacy TesseractWorker class that shipped with 2.x. The code compiles without changes under Vite, Webpack, Next.js, and Cloudflare Pages, and the same APIs work in Node when you swap the image input for a Buffer.

Installation and Setup

Install the package from npm:

npm install [email protected]

Tesseract.js ships three runtime assets: the main JavaScript module, the WebAssembly core (tesseract.js-core), and per-language traineddata files. The library downloads the core and language files on first use and caches them in IndexedDB. You do not bundle the WebAssembly yourself — the worker fetches it from a CDN mirror by default.

import { createWorker } from 'tesseract.js';

// Worker boots asynchronously, loads the core wasm, and then the language
const worker = await createWorker('eng');
const { data } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(data.text);
await worker.terminate();

Prefer a <script> tag? The jsDelivr CDN exposes a UMD build that attaches Tesseract to the global scope:

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/tesseract.min.js"></script>
<script>
  const worker = await Tesseract.createWorker('eng');
  const result = await worker.recognize('/receipt.jpg');
  document.getElementById('out').textContent = result.data.text;
  await worker.terminate();
</script>

When Tesseract.js runs in the browser it spins up a dedicated WebWorker, so the main thread stays responsive. Under Node.js the library uses worker_threads for the same effect. No special bundler configuration is required on modern stacks; Vite, Webpack 5, and Parcel 2 recognize the worker entry point automatically.

Core Features

Basic OCR from an Image URL

The smallest useful program loads one language, recognizes one image, and prints the text. recognize() accepts URLs, Blobs, Files, ImageData, Canvas elements, and even raw ArrayBuffer payloads.

import { createWorker } from 'tesseract.js';

async function readSign() {
  // createWorker returns a warmed-up worker with the language already loaded
  const worker = await createWorker('eng');
  const { data } = await worker.recognize('/images/street-sign.jpg');

  console.log('Confidence:', data.confidence);
  console.log('Text:', data.text);

  // Always release the worker — it holds a large wasm instance
  await worker.terminate();
}

The returned data object carries more than raw text: data.confidence is a 0-100 score across the whole image, data.lines, data.words, and data.symbols break the result into hierarchical units, and data.hocr and data.tsv provide standard OCR output formats.

OCR from File Input or Canvas

User-uploaded files and on-screen canvases are the most common inputs in real apps. Both flow through the same recognize() call — Tesseract.js converts them to ImageData internally.

<input type="file" id="picker" accept="image/*" />
<pre id="result"></pre>
<script type="module">
  import { createWorker } from 'tesseract.js';

  const worker = await createWorker('eng');
  const picker = document.getElementById('picker');

  picker.addEventListener('change', async (event) => {
    const file = event.target.files[0];
    if (!file) return;

    // File objects implement Blob — Tesseract.js reads them directly
    const { data } = await worker.recognize(file);
    document.getElementById('result').textContent = data.text;
  });
</script>

To OCR a portion of the screen, draw that region onto a canvas first and pass the canvas element:

const canvas = document.createElement('canvas');
canvas.width = 800;
canvas.height = 300;
const ctx = canvas.getContext('2d');
// Copy the region you care about from an existing image or video frame
ctx.drawImage(sourceImage, cropX, cropY, 800, 300, 0, 0, 800, 300);

const { data } = await worker.recognize(canvas);
console.log(data.text);

Multiple Languages (English, Korean, and More)

Tesseract.js supports more than 100 languages. Pass a single language code, an array, or a plus-delimited string when the document mixes scripts. The traineddata files live on the jsDelivr mirror by default and land in IndexedDB after the first download.

import { createWorker } from 'tesseract.js';

// Single language
const koreanWorker = await createWorker('kor');

// Multiple languages — the engine tries each and picks the best hypothesis per word
const bilingualWorker = await createWorker(['eng', 'kor']);

// Recognize a Korean document
const { data: koreanResult } = await koreanWorker.recognize('/images/menu-ko.jpg');
console.log(koreanResult.text);

// Recognize a document containing both English and Korean
const { data: mixed } = await bilingualWorker.recognize('/images/bilingual-notice.png');
console.log(mixed.text);

await koreanWorker.terminate();
await bilingualWorker.terminate();

Language codes follow ISO 639-2/T — for example eng, kor, jpn, chi_sim, chi_tra, ara, deu, fra, spa, rus. The tessdata GitHub repository lists every available file, and the engine also supports vertical-text variants like kor_vert and jpn_vert.

Progress Tracking with a Logger

WebAssembly warm-up and recognition both take non-trivial time, especially on the first run when the core downloads. Attach a logger when you create the worker to feed a progress bar.

import { createWorker } from 'tesseract.js';

const progressBar = document.getElementById('progress');
const statusLabel = document.getElementById('status');

const worker = await createWorker('eng', 1, {
  logger: (m) => {
    // m.status describes the stage, m.progress runs from 0.0 to 1.0
    statusLabel.textContent = m.status;
    progressBar.value = Math.round(m.progress * 100);
  },
});

await worker.recognize('/images/long-document.png');
await worker.terminate();

The logger emits stages such as loading tesseract core, initializing api, loading language traineddata, and recognizing text. Recognize stages fire roughly every 100 ms so the UI updates smoothly. Log those values to analytics if you want to profile recognition times against image size.

Whitelist and Blacklist Characters

Restricting the character set cuts recognition errors when you know the domain. A license-plate reader should only emit letters and digits, an invoice parser does not need punctuation beyond the dot and comma, a barcode fallback can skip letters entirely.

import { createWorker } from 'tesseract.js';

const worker = await createWorker('eng');

// Allow only uppercase letters and digits — good for license plates
await worker.setParameters({
  tessedit_char_whitelist: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
});

const { data } = await worker.recognize('/images/plate.jpg');
console.log(data.text.trim());

// Or block specific characters you never want in the output
await worker.setParameters({
  tessedit_char_whitelist: '',
  tessedit_char_blacklist: '|<>{}[]',
});

await worker.terminate();

Set tessedit_char_whitelist to an empty string to clear a previous restriction. Combining the whitelist with a page segmentation mode (tessedit_pageseg_mode) tightens results further: PSM.SINGLE_LINE suits receipts, PSM.SINGLE_WORD suits badges, PSM.AUTO handles full pages.

Bounding Boxes and Word Positions

The recognizer returns coordinates for every word and symbol. Use those boxes to highlight text on an overlay, build a searchable PDF, or pipe structured data into downstream tooling.

import { createWorker } from 'tesseract.js';

const worker = await createWorker('eng');
const { data } = await worker.recognize('/images/document.png');

// Draw a green rectangle around each recognized word
const overlay = document.getElementById('overlay').getContext('2d');
overlay.strokeStyle = '#22c55e';
overlay.lineWidth = 2;

for (const word of data.words) {
  const { x0, y0, x1, y1 } = word.bbox;
  overlay.strokeRect(x0, y0, x1 - x0, y1 - y0);
  console.log(`${word.text}`.padEnd(20), `conf=${word.confidence.toFixed(1)}`);
}

await worker.terminate();

Every word exposes bbox (pixel coordinates in the input image), text, confidence, and choices (alternate guesses the engine ranked just below the top pick). The data.lines and data.symbols arrays expose the same shape at different granularities. Push the JSON into IndexedDB when you want to cache parsed results between sessions.

Worker Lifecycle Management

Creating a worker is expensive — the WebAssembly core weighs roughly 2 MB and each traineddata file is another 5-15 MB. Create workers once and reuse them across many images. Terminate them when the user navigates away.

import { createWorker } from 'tesseract.js';

class OcrService {
  constructor() {
    this.worker = null;
  }

  async getWorker() {
    // Lazy-initialize so we do not block the first paint
    if (!this.worker) {
      this.worker = await createWorker(['eng', 'kor']);
    }
    return this.worker;
  }

  async read(image) {
    const worker = await this.getWorker();
    const { data } = await worker.recognize(image);
    return data.text;
  }

  async dispose() {
    if (this.worker) {
      await this.worker.terminate();
      this.worker = null;
    }
  }
}

// In a React component, dispose inside a useEffect cleanup
// In a plain page, listen for beforeunload
const service = new OcrService();
window.addEventListener('beforeunload', () => service.dispose());

For throughput-heavy pipelines, the createScheduler() helper distributes jobs across several workers. Add 2-4 workers to the scheduler and it queues recognize calls round-robin — useful when a user drops a folder of images onto the page.

Common Pitfalls

Language traineddata downloads. The first createWorker('kor') fetches a ~16 MB file from the CDN. Show a loading indicator and warn users on slow connections. Host the traineddata on your own origin by setting langPath if CDN access is blocked in your environment.

Memory pressure. Long-lived workers accumulate cached glyph data. Call worker.terminate() between large batches or recreate the worker every 100 recognitions when you process huge archives. Mobile Safari tabs get killed around 400 MB, so monitor with performance.memory when you can.

Image preprocessing changes accuracy more than parameters. Feed clean input: crop tight to the text, convert to grayscale, boost contrast, scale small text up to at least 300 DPI equivalent. A blurry phone photo at 640 px wide recognizes poorly; the same image upscaled to 1600 px and binarized often jumps 20 confidence points.

Single worker, single job. A worker processes one recognize() call at a time — queuing multiple jobs against the same worker serializes them. If users trigger parallel recognitions, use createScheduler() with multiple workers or await each call before starting the next.

Alternatives Comparison

Tesseract.js is not the only OCR option for web stacks. Pick the engine that matches your privacy, latency, and accuracy constraints.

Engine	Runs In	Strengths	Trade-offs
Tesseract.js	Browser + Node	Offline, private, free, 100+ languages	Slower than cloud APIs on large images
Google Cloud Vision	Server	Best accuracy on photos, handwriting, dense layouts	Per-request cost, network round trip, data leaves the device
PaddleOCR	Python + ONNX/WASM builds	Excellent CJK accuracy, detection + recognition pipeline	Heavier deploy, ONNX runtime tuning needed for browser

A pragmatic pattern: run Tesseract.js on-device for drafts and interactive previews, then call a cloud OCR only when the user opts into higher accuracy on difficult scans. That keeps per-call costs near zero for most traffic while reserving the paid API for edge cases.

References

The official sources linked above track every release. The GitHub examples/ directory contains runnable demos for scheduler usage, PDF input, and video frame OCR that go beyond this guide. The project documentation site lists every Tesseract parameter so you can tune tessedit_ocr_engine_mode, tessedit_pageseg_mode, and debug flags without guessing.