FPDF_LoadCustomDocument

FPDF_LoadCustomDocument(FPDF_FILEACCESS* pFileAccess, FPDF_BYTESTRING password)

Description

Loads a PDF document from a custom access mechanism. This function is particularly useful for:

Implementing streaming or on-demand loading of large PDF files
Custom file access (e.g., from a database, cloud storage, or encrypted sources)
Range requests or partial loading of PDF content
Progressive loading of PDF files in web applications

The custom access mechanism is defined by the caller through a structure containing a length field and a callback function to read blocks of data.

Prerequisites

This example uses initialization approaches from our Getting Started guide. Note that the implementation is currently limited by browser capabilities as explained in the “Implementation Status” section below.

Parameters

Name	Type	Description
`pFileAccess`	`FPDF_FILEACCESS*`	Pointer to a structure that contains information about the file and a custom read function.
`password`	`FPDF_BYTESTRING`	Optional password for encrypted documents (null-terminated UTF-8 string). Use `NULL` if no password is needed.

FPDF_FILEACCESS Structure

The FPDF_FILEACCESS structure contains:


typedef struct {
  unsigned long m_FileLen;      // File length, in bytes
  int (*m_GetBlock)(void* param, unsigned long position, unsigned char* pBuf, unsigned long size);
  void* m_Param;                // User-defined parameter for m_GetBlock
} FPDF_FILEACCESS;

The m_GetBlock function should:

Return data at the specified position within the file
Fill the buffer with the requested data
Return the number of bytes successfully read
Return 0 if reading fails
Important: This function must be synchronous - it must return immediately with the requested data

Return Value

Returns a handle to the loaded document, or NULL on failure. Use FPDF_GetLastError() to retrieve the error code if loading fails.

Implementation Status: Awaiting JSPI

Current Limitations

Implementing FPDF_LoadCustomDocument in a browser environment currently faces a significant challenge: PDFium requires a synchronous callback for m_GetBlock, but modern browsers primarily support asynchronous APIs for fetching data (like fetch() and Promises).

The traditional workaround using synchronous XMLHttpRequest is no longer recommended:

Synchronous XHR is deprecated
It blocks the main thread, causing UI freezes
It’s being removed from browsers in certain contexts

The Future: JavaScript Promise Integration (JSPI)

The WebAssembly community is addressing this issue with JavaScript Promise Integration (JSPI), a proposal that will bridge the gap between synchronous WebAssembly code and asynchronous JavaScript APIs.

JSPI will allow WebAssembly applications to:

Call asynchronous JavaScript APIs using synchronous code patterns
Suspend execution when a Promise is encountered
Resume execution once the Promise resolves
Maintain the appearance of synchronous code flow

Once JSPI is fully implemented in browsers, implementing efficient streaming PDF loading with FPDF_LoadCustomDocument will become much more straightforward.

Current Status of JSPI

JSPI is currently in Chrome Origin Trial
It’s in Phase 3 of the WebAssembly standardization process
Full standardization is expected by the end of 2025
You can read more about it in this V8 blog post

Basic Implementation Example

Until JSPI is widely available, here’s a conceptual overview of how FPDF_LoadCustomDocument would be implemented:


// Note: This is a conceptual implementation that requires JSPI to work properly
// For initialization details, see: /docs/pdfium/getting-started
import { init } from '@embedpdf/pdfium';
 
async function loadPdfWithCustomAccess(url: string) {
  // Initialize PDFium
  const pdfium = await init();
  pdfium.PDFiumExt_Init();
  
  // 1. First load the file size with a HEAD request
  const response = await fetch(url, { method: 'HEAD' });
  const fileSize = parseInt(response.headers.get('content-length') || '0');
  
  // 2. Set up FPDF_FILEACCESS structure
  const fileAccessPtr = pdfium.pdfium.wasmExports.malloc(12);
  pdfium.pdfium.HEAPU32[fileAccessPtr / 4] = fileSize;
  
  // 3. Create read callback (this would leverage JSPI in the future)
  const readBlock = (fileAccess, position, pBuf, size) => {
    // This function needs to synchronously return the requested data
    // With JSPI, this could use await fetch() in the future
    // For now, this is where the implementation challenge exists
    
    // Simplified placeholder implementation:
    return 0; // Indicates read failure
  };
  
  // 4. Set up callback pointer
  const readBlockPtr = pdfium.pdfium.addFunction(readBlock, 'iiiii');
  pdfium.pdfium.HEAPU32[fileAccessPtr / 4 + 1] = readBlockPtr;
  pdfium.pdfium.HEAPU32[fileAccessPtr / 4 + 2] = 0; // User data
  
  // 5. Attempt to load the document
  const docPtr = pdfium.FPDF_LoadCustomDocument(fileAccessPtr, 0);
  
  // 6. Check for errors and clean up
  if (!docPtr) {
    const errorCode = pdfium.FPDF_GetLastError();
    pdfium.pdfium.removeFunction(readBlockPtr);
    pdfium.pdfium.wasmExports.free(fileAccessPtr);
    throw new Error(`Failed to load PDF: error code ${errorCode}`);
  }
  
  // 7. Return document with cleanup method
  return {
    docPtr,
    pageCount: pdfium.FPDF_GetPageCount(docPtr),
    close: () => {
      pdfium.FPDF_CloseDocument(docPtr);
      pdfium.pdfium.removeFunction(readBlockPtr);
      pdfium.pdfium.wasmExports.free(fileAccessPtr);
    }
  };
}

Implementation Strategies (Future)

When JSPI becomes widely available, consider these approaches for implementing a custom document loader:

Chunked Loading:
- Load chunks on demand as PDFium requests data
- Cache chunks to avoid redundant network requests
- Track which chunks are loaded for progress reporting
Parallel Prefetching:
- Start fetching nearby chunks when PDFium requests data
- Predict which chunks might be needed next based on access patterns
- Implement a cache eviction policy for very large files
Range Request Optimization:
- Merge adjacent range requests to reduce network overhead
- Use HTTP range requests (Range: bytes=start-end) for efficient partial loading
- Group small nearby requests into a single larger request

Best Practices

Error Handling: Always check the return value of FPDF_LoadCustomDocument and use FPDF_GetLastError() to determine what went wrong if loading fails.
Resource Cleanup: Call removeFunction() to clean up function references created with addFunction() to prevent memory leaks.
Memory Management: Clean up all resources when done, including the FPDF_FILEACCESS structure.
Workarounds: Until JSPI is available, consider alternative approaches like:
- Pre-loading the entire PDF file instead of streaming (for smaller files)
- Using Web Workers with transferable objects to reduce main thread blocking

FPDF_LoadMemDocument - Load a PDF document from memory
FPDF_GetLastError - Get the error code for the last failed operation
FPDF_CloseDocument - Close a loaded PDF document