FPDF_LoadCustomDocument
FPDF_LoadCustomDocument(FPDF_FILEACCESS* pFileAccess, FPDF_BYTESTRING password)
Description
Loads a PDF document from a custom access mechanism. This function is particularly useful for:
- Implementing streaming or on-demand loading of large PDF files
- Custom file access (e.g., from a database, cloud storage, or encrypted sources)
- Range requests or partial loading of PDF content
- Progressive loading of PDF files in web applications
The custom access mechanism is defined by the caller through a structure containing a length field and a callback function to read blocks of data.
Prerequisites
This example uses initialization approaches from our Getting Started guide. Note that the implementation is currently limited by browser capabilities as explained in the “Implementation Status” section below.
Parameters
Name | Type | Description |
---|---|---|
pFileAccess | FPDF_FILEACCESS* | Pointer to a structure that contains information about the file and a custom read function. |
password | FPDF_BYTESTRING | Optional password for encrypted documents (null-terminated UTF-8 string). Use NULL if no password is needed. |
FPDF_FILEACCESS Structure
The FPDF_FILEACCESS
structure contains:
typedef struct {
unsigned long m_FileLen; // File length, in bytes
int (*m_GetBlock)(void* param, unsigned long position, unsigned char* pBuf, unsigned long size);
void* m_Param; // User-defined parameter for m_GetBlock
} FPDF_FILEACCESS;
The m_GetBlock
function should:
- Return data at the specified position within the file
- Fill the buffer with the requested data
- Return the number of bytes successfully read
- Return 0 if reading fails
- Important: This function must be synchronous - it must return immediately with the requested data
Return Value
Returns a handle to the loaded document, or NULL
on failure. Use FPDF_GetLastError()
to retrieve the error code if loading fails.
Implementation Status: Awaiting JSPI
Current Limitations
Implementing FPDF_LoadCustomDocument
in a browser environment currently faces a significant challenge: PDFium requires a synchronous callback for m_GetBlock
, but modern browsers primarily support asynchronous APIs for fetching data (like fetch()
and Promises).
The traditional workaround using synchronous XMLHttpRequest is no longer recommended:
- Synchronous XHR is deprecated
- It blocks the main thread, causing UI freezes
- It’s being removed from browsers in certain contexts
The Future: JavaScript Promise Integration (JSPI)
The WebAssembly community is addressing this issue with JavaScript Promise Integration (JSPI), a proposal that will bridge the gap between synchronous WebAssembly code and asynchronous JavaScript APIs.
JSPI will allow WebAssembly applications to:
- Call asynchronous JavaScript APIs using synchronous code patterns
- Suspend execution when a Promise is encountered
- Resume execution once the Promise resolves
- Maintain the appearance of synchronous code flow
Once JSPI is fully implemented in browsers, implementing efficient streaming PDF loading with FPDF_LoadCustomDocument
will become much more straightforward.
Current Status of JSPI
- JSPI is currently in Chrome Origin Trial
- It’s in Phase 3 of the WebAssembly standardization process
- Full standardization is expected by the end of 2025
- You can read more about it in this V8 blog post
Basic Implementation Example
Until JSPI is widely available, here’s a conceptual overview of how FPDF_LoadCustomDocument
would be implemented:
// Note: This is a conceptual implementation that requires JSPI to work properly
// For initialization details, see: /docs/pdfium/getting-started
import { init } from '@embedpdf/pdfium';
async function loadPdfWithCustomAccess(url: string) {
// Initialize PDFium
const pdfium = await init();
pdfium.PDFiumExt_Init();
// 1. First load the file size with a HEAD request
const response = await fetch(url, { method: 'HEAD' });
const fileSize = parseInt(response.headers.get('content-length') || '0');
// 2. Set up FPDF_FILEACCESS structure
const fileAccessPtr = pdfium.pdfium.wasmExports.malloc(12);
pdfium.pdfium.HEAPU32[fileAccessPtr / 4] = fileSize;
// 3. Create read callback (this would leverage JSPI in the future)
const readBlock = (fileAccess, position, pBuf, size) => {
// This function needs to synchronously return the requested data
// With JSPI, this could use await fetch() in the future
// For now, this is where the implementation challenge exists
// Simplified placeholder implementation:
return 0; // Indicates read failure
};
// 4. Set up callback pointer
const readBlockPtr = pdfium.pdfium.addFunction(readBlock, 'iiiii');
pdfium.pdfium.HEAPU32[fileAccessPtr / 4 + 1] = readBlockPtr;
pdfium.pdfium.HEAPU32[fileAccessPtr / 4 + 2] = 0; // User data
// 5. Attempt to load the document
const docPtr = pdfium.FPDF_LoadCustomDocument(fileAccessPtr, 0);
// 6. Check for errors and clean up
if (!docPtr) {
const errorCode = pdfium.FPDF_GetLastError();
pdfium.pdfium.removeFunction(readBlockPtr);
pdfium.pdfium.wasmExports.free(fileAccessPtr);
throw new Error(`Failed to load PDF: error code ${errorCode}`);
}
// 7. Return document with cleanup method
return {
docPtr,
pageCount: pdfium.FPDF_GetPageCount(docPtr),
close: () => {
pdfium.FPDF_CloseDocument(docPtr);
pdfium.pdfium.removeFunction(readBlockPtr);
pdfium.pdfium.wasmExports.free(fileAccessPtr);
}
};
}
Implementation Strategies (Future)
When JSPI becomes widely available, consider these approaches for implementing a custom document loader:
-
Chunked Loading:
- Load chunks on demand as PDFium requests data
- Cache chunks to avoid redundant network requests
- Track which chunks are loaded for progress reporting
-
Parallel Prefetching:
- Start fetching nearby chunks when PDFium requests data
- Predict which chunks might be needed next based on access patterns
- Implement a cache eviction policy for very large files
-
Range Request Optimization:
- Merge adjacent range requests to reduce network overhead
- Use HTTP range requests (
Range: bytes=start-end
) for efficient partial loading - Group small nearby requests into a single larger request
Best Practices
-
Error Handling: Always check the return value of
FPDF_LoadCustomDocument
and useFPDF_GetLastError()
to determine what went wrong if loading fails. -
Resource Cleanup: Call
removeFunction()
to clean up function references created withaddFunction()
to prevent memory leaks. -
Memory Management: Clean up all resources when done, including the FPDF_FILEACCESS structure.
-
Workarounds: Until JSPI is available, consider alternative approaches like:
- Pre-loading the entire PDF file instead of streaming (for smaller files)
- Using Web Workers with transferable objects to reduce main thread blocking
Related Functions
- FPDF_LoadMemDocument - Load a PDF document from memory
- FPDF_GetLastError - Get the error code for the last failed operation
- FPDF_CloseDocument - Close a loaded PDF document