DocsPDFium JavaScript APIIntroduction

PDFium JavaScript API

Introduction

The @embedpdf/pdfium library provides a powerful JavaScript interface to PDFium, enabling high-quality PDF rendering and manipulation directly in web applications. This library brings native-quality PDF capabilities to the browser through WebAssembly, without requiring any server-side processing.

What is PDFium?

PDFium is an open-source PDF rendering engine originally developed by Foxit Software and later released as open source by Google. Written in C++, it’s the same engine that powers PDF viewing in Chrome and numerous other applications. PDFium offers comprehensive PDF capabilities including:

  • High-fidelity rendering of PDF pages
  • Text extraction and search
  • Form filling and manipulation
  • Annotation support
  • Digital signature verification
  • PDF modification and creation

Why PDFium?

We chose PDFium as our engine for several compelling reasons:

  1. Industry-proven reliability: As the PDF engine behind Chrome and many commercial applications, PDFium has been battle-tested on billions of documents.
  2. Comprehensive feature set: PDFium supports the full PDF specification, including complex features like forms, annotations, and digital signatures.
  3. Active development: Being maintained by Google and the open-source community ensures ongoing improvements and security updates.
  4. Performance: PDFium is optimized for speed and memory efficiency, essential for web applications.

WebAssembly and Emscripten

WebAssembly (WASM) is a binary instruction format that enables high-performance execution of code in web browsers. It serves as a portable compilation target for languages like C/C++, allowing them to run at near-native speed in the browser. WebAssembly code executes in a memory-safe, sandboxed environment and can seamlessly interoperate with JavaScript.

Emscripten is a toolchain that compiles C and C++ code to WebAssembly. It provides the necessary infrastructure to port complex native applications to the web, including:

  1. A complete compiler toolchain based on LLVM
  2. Runtime environment that emulates key parts of a native system
  3. Glue code generation to bridge between JavaScript and compiled code
  4. Memory management utilities

Through Emscripten, we’ve compiled the entire PDFium C++ codebase to WebAssembly, making it available for JavaScript developers while maintaining the performance benefits of the original native code.

Installation

npm i @embedpdf/pdfium

Basic Usage

import { init, WrappedPdfiumModule } from '@embedpdf/pdfium'; const pdfiumWasm = 'https://cdn.jsdelivr.net/npm/@embedpdf/pdfium/dist/pdfium.wasm'; let pdfiumInstance: WrappedPdfiumModule | null = null; async function initializePdfium() { if (pdfiumInstance) return pdfiumInstance; const response = await fetch(pdfiumWasm); const wasmBinary = await response.arrayBuffer(); pdfiumInstance = await init({ wasmBinary }); // Initialize the PDFium extension library // This is required before performing any PDF operations pdfiumInstance.PDFiumExt_Init(); return pdfiumInstance; } const pdfium = await initializePdfium();

PDFiumExt_Init Initialization

The PDFiumExt_Init() function call is a crucial step in the initialization process. This function sets up the PDFium extension library with the necessary configurations for proper operation in a WebAssembly environment. It must be called after the WebAssembly module is initialized but before any PDF operations are performed.

PDFiumExt_Init handles:

  • Configuring memory allocators
  • Setting up error handling
  • Preparing rendering subsystems
  • Initializing font handling

Omitting this call may cause operations to fail, produce incorrect results, or even crash the application.

Understanding WebAssembly and Memory Management

PDFium is a C++ library compiled to WebAssembly (WASM), which allows it to run in web browsers. When you use @embedpdf/pdfium, you’re interacting with this compiled code through JavaScript.

Memory Management Basics

When working with PDFium, you need to understand a few key concepts:

  1. Memory Allocation: You need to allocate memory for data you want to pass to PDFium.
  2. Pointers: These are references to locations in memory.
  3. Memory Cleanup: You must free allocated memory when you’re done to prevent memory leaks.
// Allocate memory const ptr = pdfium.pdfium.wasmExports.malloc(size); // Use the memory... // Free memory when done pdfium.pdfium.wasmExports.free(ptr);

Core Concepts

Document Handling

PDFium operations typically follow this workflow:

  1. Initialize PDFium (with PDFiumExt_Init)
  2. Load a document
  3. Perform operations (render pages, extract text, etc.)
  4. Close the document and free resources

Function Categories

The PDFium API consists of functions with different purposes:

  • Document functions: Open, close, and manage PDF documents
  • Page functions: Load, render, and manipulate pages
  • Text functions: Extract and search text
  • Annotation functions: Work with PDF annotations
  • Form functions: Interact with PDF forms
  • Bookmark functions: Work with PDF bookmarks

Best Practices

  1. Always call PDFiumExt_Init: Call this function once after initializing the WebAssembly module and before any PDF operations.
  2. Always free memory: Use pdfium.pdfium.wasmExports.free() to release memory allocated with malloc.
  3. Close resources: Always close pages, text pages, and documents when you’re done with them.
  4. Use try/finally blocks: Ensure resources are properly cleaned up even if errors occur.
  5. Check return values: Many PDFium functions return 0 or null on failure.
  6. Handle errors gracefully: Use FPDF_GetLastError() to get more information about failures.
Last updated on March 19, 2025