Introduction
Pdf.js is a JavaScript library built by Mozilla that parses and renders PDF documents using only HTML5 standards — no native plugins, no server round trips. The project ships as the default PDF viewer in Firefox and powers preview experiences across Chromium-based browsers via third-party embeds. GitLab embeds it for merge request attachments, Overleaf uses it to preview compiled LaTeX output, and countless document management systems rely on it for in-browser previews.
The npm package pdfjs-dist exposes the same rendering engine as a module you can drop into any web app. This guide walks through version 5.6.205, the current stable release, covering document loading, canvas rendering, text extraction, viewport scaling, password prompts, and text selection. Every example runs in a modern browser with ES modules.
Why reach for Pdf.js instead of an <iframe> with a direct PDF URL? Embedded iframes hand control to the browser's native viewer, which varies wildly across vendors — Safari, Firefox, and Chrome each expose a different UI, different shortcut keys, and different levels of accessibility. A Pdf.js integration gives you a single, consistent rendering surface that you style and control. You can hide the toolbar, swap out icons, add custom annotations, track analytics events, or wire the viewer into a larger app's state.
Installation and Setup
Install the package from npm:
npm install [email protected]Pdf.js splits work between the main thread and a worker. The worker handles the heavy parsing so the UI stays responsive. You must point the library at the worker file before calling any API.
import * as pdfjsLib from 'pdfjs-dist';
// Point the library at the bundled worker file
pdfjsLib.GlobalWorkerOptions.workerSrc = new URL(
'pdfjs-dist/build/pdf.worker.min.mjs',
import.meta.url
).toString();Version 5.x ships ESM builds only — the legacy UMD bundle was dropped. If your bundler cannot resolve new URL() patterns, the CDN route works too:
<script type="module">
import * as pdfjsLib from 'https://cdn.jsdelivr.net/npm/[email protected]/build/pdf.mjs';
pdfjsLib.GlobalWorkerOptions.workerSrc =
'https://cdn.jsdelivr.net/npm/[email protected]/build/pdf.worker.min.mjs';
</script>Node.js 20.19+ or 22.13+ is required for local tooling. Browser support covers the latest two versions of Chrome, Firefox, Safari, and Edge.
Core Features
Loading a PDF Document
Loading starts with getDocument(), which returns a loading task. The task resolves to a PDFDocumentProxy once the document header and cross-reference table are parsed. The rest of the pages stream in on demand.
async function loadPdf(url) {
// getDocument accepts URL strings, ArrayBuffer, Uint8Array, or a config object
const loadingTask = pdfjsLib.getDocument(url);
// Track progress for large files
loadingTask.onProgress = ({ loaded, total }) => {
const pct = total ? Math.round((loaded / total) * 100) : 0;
console.log(`Loading: ${pct}%`);
};
const pdf = await loadingTask.promise;
console.log(`Pages: ${pdf.numPages}`);
return pdf;
}Rendering a Page to Canvas
Each page exposes a getViewport() method that computes pixel dimensions for a given scale. Pair the viewport with a canvas 2D context and pass both to render().
async function renderPage(pdf, pageNumber, canvas) {
const page = await pdf.getPage(pageNumber);
const scale = 1.5;
const viewport = page.getViewport({ scale });
// Match canvas bitmap size to viewport, accounting for device pixel ratio
const dpr = window.devicePixelRatio || 1;
canvas.width = Math.floor(viewport.width * dpr);
canvas.height = Math.floor(viewport.height * dpr);
canvas.style.width = `${viewport.width}px`;
canvas.style.height = `${viewport.height}px`;
const ctx = canvas.getContext('2d');
ctx.scale(dpr, dpr);
const renderTask = page.render({
canvasContext: ctx,
viewport,
});
await renderTask.promise;
page.cleanup();
}Calling page.cleanup() after rendering releases page-specific resources. Skip it if you plan to re-render the same page soon.
Extracting Text Content
Text extraction runs through page.getTextContent(). The result is an array of text items, each with a string, a transform matrix, and font metadata. Join them to reconstruct the page text.
async function extractText(pdf, pageNumber) {
const page = await pdf.getPage(pageNumber);
const textContent = await page.getTextContent();
// Each item has { str, dir, width, height, transform, fontName }
const text = textContent.items
.map((item) => item.str)
.join(' ');
return text;
}
// Extract all pages in parallel
async function extractAll(pdf) {
const pageNumbers = Array.from({ length: pdf.numPages }, (_, i) => i + 1);
const pages = await Promise.all(
pageNumbers.map((n) => extractText(pdf, n))
);
return pages.join('\n\n');
}Scanned PDFs contain no text layer. An empty array from getTextContent() means the page is image-only, and you would need an OCR pass to recover characters.
Handling Multiple Pages
A viewer UI usually renders one page at a time or a virtualized list. Here is a minimal navigation setup with Previous and Next buttons.
class PdfViewer {
constructor(canvas, pdf) {
this.canvas = canvas;
this.pdf = pdf;
this.currentPage = 1;
this.scale = 1.5;
}
async render() {
const page = await this.pdf.getPage(this.currentPage);
const viewport = page.getViewport({ scale: this.scale });
this.canvas.width = viewport.width;
this.canvas.height = viewport.height;
await page.render({
canvasContext: this.canvas.getContext('2d'),
viewport,
}).promise;
}
async next() {
if (this.currentPage < this.pdf.numPages) {
this.currentPage += 1;
await this.render();
}
}
async prev() {
if (this.currentPage > 1) {
this.currentPage -= 1;
await this.render();
}
}
}Zoom and Viewport Scaling
Scaling is a property of the viewport. Rebuild the viewport with a new scale value and re-render to zoom in or out.
const ZOOM_LEVELS = [0.5, 0.75, 1.0, 1.25, 1.5, 2.0, 3.0];
async function zoomIn(viewer) {
const idx = ZOOM_LEVELS.indexOf(viewer.scale);
if (idx < ZOOM_LEVELS.length - 1) {
viewer.scale = ZOOM_LEVELS[idx + 1];
await viewer.render();
}
}
// Fit-to-width calculation
async function fitToWidth(viewer, containerWidth) {
const page = await viewer.pdf.getPage(viewer.currentPage);
const unscaledViewport = page.getViewport({ scale: 1 });
viewer.scale = containerWidth / unscaledViewport.width;
await viewer.render();
}For rotation, pass rotation: 90 (or 180, 270) to getViewport(). The library rotates clockwise in 90-degree steps. Combine rotation with scale to build a viewer that handles landscape scans comfortably — a landscape page at scale 1.0 with rotation 90 fits portrait-oriented screens far better than the unrotated version.
A small detail that catches people off guard: scale is applied to the PDF's declared page dimensions, not to CSS pixels. A US Letter page is 612 by 792 points, so scale 1.0 yields a 612-pixel-wide canvas. High-DPI screens still need the devicePixelRatio multiplier shown earlier to avoid blurry text.
Password-Protected PDFs
Encrypted documents trigger a password callback on the loading task. Respond by calling onPassword with the user input, or throw to cancel.
async function loadProtectedPdf(url) {
const loadingTask = pdfjsLib.getDocument(url);
loadingTask.onPassword = (updatePassword, reason) => {
const label = reason === pdfjsLib.PasswordResponses.INCORRECT_PASSWORD
? 'Password was wrong. Try again:'
: 'This document is encrypted. Enter password:';
const password = window.prompt(label);
if (password === null) {
loadingTask.destroy();
} else {
updatePassword(password);
}
};
return loadingTask.promise;
}The reason value distinguishes a first prompt (NEED_PASSWORD) from a retry (INCORRECT_PASSWORD), which lets you show different wording.
Text Selection and Search
Canvas alone does not support text selection. Pdf.js provides a companion text layer that overlays invisible spans on top of the canvas, matching the rendered glyph positions. The TextLayer class handles that wiring.
import { TextLayer } from 'pdfjs-dist';
async function renderWithTextLayer(pdf, pageNumber, canvas, textLayerDiv) {
const page = await pdf.getPage(pageNumber);
const viewport = page.getViewport({ scale: 1.5 });
canvas.width = viewport.width;
canvas.height = viewport.height;
await page.render({
canvasContext: canvas.getContext('2d'),
viewport,
}).promise;
// Stretch text layer div to match canvas dimensions
textLayerDiv.style.width = `${viewport.width}px`;
textLayerDiv.style.height = `${viewport.height}px`;
const textContent = await page.getTextContent();
const textLayer = new TextLayer({
textContentSource: textContent,
container: textLayerDiv,
viewport,
});
await textLayer.render();
}Style the container with position: absolute; inset: 0; opacity: 0.2; during development so you can verify alignment. Drop the opacity to 0 in production while keeping pointer events active.
Common Pitfalls
- Worker path mismatches. Setting
workerSrcto a URL that returns 404 shows up as "Setting up fake worker failed" warnings plus slow rendering. Verify the URL returns the worker JS with a 200 status code. - CORS on remote PDFs. Fetching a PDF from another origin fails unless the server responds with
Access-Control-Allow-Origin. Proxy the file through your own backend if you cannot control the source server. - Font substitution warnings. PDFs with missing embedded fonts fall back to standard replacements. The
standardFontDataUrloption ingetDocument()points the library at a folder of Type1 substitutes for cleaner output. - Large files crashing tabs. A 200MB PDF parses fine but rendering every page up front exhausts memory. Render lazily, call
page.cleanup()after each page, and callpdf.destroy()when the document closes. - Async lifecycle races. A user clicking Next twice in quick succession can start two render tasks on the same canvas. Keep a reference to the active
RenderTaskand call.cancel()before starting a new one.
Alternatives Comparison
Pdf.js focuses on reading — parsing and rendering existing documents. If you need to create or modify PDFs, look elsewhere.
| Library | Primary Use | Runtime | Strengths |
|---|---|---|---|
| pdfjs-dist | Reading, rendering | Browser + Node | Canvas render, text extraction, official Mozilla project |
| PDFium (via WASM bindings) | Reading, rendering | Native / WASM | Google Chrome engine, fast, bindings like pdfium-wasm exist |
| pdf-lib | Writing, editing | Browser + Node | Create from scratch, modify pages, fill forms |
Mix pdfjs-dist with pdf-lib when an app needs both — render existing pages for preview, then use pdf-lib to stamp or append before download. Typical pairing: user uploads a contract, pdfjs-dist renders it in the browser so the user can review, then pdf-lib writes their signature image onto the correct page before the final file is uploaded back to the server.
Worth noting that pdfjs-dist also runs in Node through the @napi-rs/canvas optional dependency. Server-side rendering of PDF thumbnails, print-preview generation, and automated document testing all fit this mode. Performance is slower than a headless browser using the native engine, but the API surface is identical to the browser build.
References
The official sources listed at the top of this post stay current with each release. The GitHub repo includes the examples/ folder with runnable demos for text search, annotations, and form fields that go beyond the scope of this tutorial. The API reference on the official website documents every public type with examples, which is the fastest way to look up a method signature during development.