FPDFText_CountChars

FPDFText_CountChars(text_page)

Description

Gets the number of characters in a PDF page. This function is typically called before allocating a buffer for text extraction, as it tells you how much space you’ll need to accommodate all the text on the page.

Prerequisites

This example uses the initializePdfium helper function from our Getting Started guide. Make sure to include this function in your code before trying the examples below.

Parameters

NameTypeDescription
text_pagenumberA text page handle obtained from FPDFText_LoadPage.

Return Value

Returns the number of characters on the page (including whitespace characters), or -1 on error. Some common error cases include:

  • The text_page handle is invalid
  • The PDF page contains no text content

A return value of 0 indicates that the page exists but contains no text characters.

Example

// Note: The initializePdfium function is a helper that initializes the PDFium library. // For the full implementation, see: /docs/pdfium/getting-started import { initializePdfium } from './initialize-pdfium'; async function getPageCharCount(pdfData: Uint8Array, pageIndex: number): Promise<number> { // Initialize PDFium const pdfium = await initializePdfium(); // Load the PDF document const filePtr = pdfium.pdfium.wasmExports.malloc(pdfData.length); pdfium.pdfium.HEAPU8.set(pdfData, filePtr); const docPtr = pdfium.FPDF_LoadMemDocument(filePtr, pdfData.length, 0); if (!docPtr) { const error = pdfium.FPDF_GetLastError(); pdfium.pdfium.wasmExports.free(filePtr); throw new Error(`Failed to load PDF: ${error}`); } try { // Check if the page index is valid const pageCount = pdfium.FPDF_GetPageCount(docPtr); if (pageIndex < 0 || pageIndex >= pageCount) { throw new Error(`Invalid page index: ${pageIndex}. Document has ${pageCount} pages.`); } // Load the PDF page const pagePtr = pdfium.FPDF_LoadPage(docPtr, pageIndex); if (!pagePtr) { throw new Error(`Failed to load page ${pageIndex}`); } try { // Create a text page object const textPagePtr = pdfium.FPDFText_LoadPage(pagePtr); if (!textPagePtr) { throw new Error(`Failed to load text for page ${pageIndex}`); } try { // Get the character count const charCount = pdfium.FPDFText_CountChars(textPagePtr); if (charCount === -1) { throw new Error(`Error getting character count for page ${pageIndex}`); } return charCount; } finally { // Clean up text page pdfium.FPDFText_ClosePage(textPagePtr); } } finally { // Clean up PDF page pdfium.FPDF_ClosePage(pagePtr); } } finally { // Clean up document pdfium.FPDF_CloseDocument(docPtr); pdfium.pdfium.wasmExports.free(filePtr); } } // Usage fetch('sample.pdf') .then(response => response.arrayBuffer()) .then(buffer => getPageCharCount(new Uint8Array(buffer), 0)) .then(count => { if (count === 0) { console.log('The page exists but contains no text.'); } else { console.log(`The page contains ${count} characters.`); } }) .catch(error => console.error('Error:', error));

Usage Examples

Allocating a buffer for text extraction

// Get the character count const charCount = pdfium.FPDFText_CountChars(textPagePtr); if (charCount <= 0) { return ''; // No text or error } // Allocate a buffer for the text (+1 for null terminator) const bufferSize = (charCount + 1) * 2; // UTF-16, 2 bytes per character const textBufferPtr = pdfium.pdfium.wasmExports.malloc(bufferSize); try { // Extract text into the buffer pdfium.FPDFText_GetText(textPagePtr, 0, charCount, textBufferPtr); // ... } finally { // Clean up buffer pdfium.pdfium.wasmExports.free(textBufferPtr); }

Determining if a page contains text

function pageHasText(pdfium, textPagePtr) { const charCount = pdfium.FPDFText_CountChars(textPagePtr); return charCount > 0; }

Best Practices

  1. Check for errors: Always check if FPDFText_CountChars returns -1, which indicates an error.

  2. Handle empty pages: A return value of 0 is valid and means the page exists but contains no text. Your code should handle this case gracefully.

  3. Buffer allocation: When allocating memory for text extraction, always add 1 to the character count to accommodate the null terminator required by PDFium.

  4. Use for capacity planning: This function is useful for determining how much memory to allocate for text extraction, allowing you to avoid buffer overflows.

Common Issues

  1. Invalid text page handle: If you pass an invalid text page handle to FPDFText_CountChars, it will return -1.

  2. Confusing character count with byte length: Remember that PDFium uses UTF-16LE encoding for text, so each character typically requires 2 bytes of storage. When allocating memory for text extraction, multiply the character count by 2.

  3. Scanned documents: PDFs created from scanned images won’t have text content unless OCR has been applied. FPDFText_CountChars will return 0 for these pages.

Last updated on March 31, 2025