FPDFText_CountChars

FPDFText_CountChars(text_page)

Description

Gets the number of characters in a PDF page. This function is typically called before allocating a buffer for text extraction, as it tells you how much space you’ll need to accommodate all the text on the page.

Prerequisites

This example uses the initializePdfium helper function from our Getting Started guide. Make sure to include this function in your code before trying the examples below.

Parameters

Name	Type	Description
text_page	number	A text page handle obtained from `FPDFText_LoadPage`.

Return Value

Returns the number of characters on the page (including whitespace characters), or -1 on error. Some common error cases include:

The text_page handle is invalid
The PDF page contains no text content

A return value of 0 indicates that the page exists but contains no text characters.

Example


// Note: The initializePdfium function is a helper that initializes the PDFium library.
// For the full implementation, see: /docs/pdfium/getting-started
import { initializePdfium } from './initialize-pdfium';
 
async function getPageCharCount(pdfData: Uint8Array, pageIndex: number): Promise<number> {
  // Initialize PDFium
  const pdfium = await initializePdfium();
  
  // Load the PDF document
  const filePtr = pdfium.pdfium.wasmExports.malloc(pdfData.length);
  pdfium.pdfium.HEAPU8.set(pdfData, filePtr);
  const docPtr = pdfium.FPDF_LoadMemDocument(filePtr, pdfData.length, 0);
  
  if (!docPtr) {
    const error = pdfium.FPDF_GetLastError();
    pdfium.pdfium.wasmExports.free(filePtr);
    throw new Error(`Failed to load PDF: ${error}`);
  }
  
  try {
    // Check if the page index is valid
    const pageCount = pdfium.FPDF_GetPageCount(docPtr);
    if (pageIndex < 0 || pageIndex >= pageCount) {
      throw new Error(`Invalid page index: ${pageIndex}. Document has ${pageCount} pages.`);
    }
    
    // Load the PDF page
    const pagePtr = pdfium.FPDF_LoadPage(docPtr, pageIndex);
    if (!pagePtr) {
      throw new Error(`Failed to load page ${pageIndex}`);
    }
    
    try {
      // Create a text page object
      const textPagePtr = pdfium.FPDFText_LoadPage(pagePtr);
      if (!textPagePtr) {
        throw new Error(`Failed to load text for page ${pageIndex}`);
      }
      
      try {
        // Get the character count
        const charCount = pdfium.FPDFText_CountChars(textPagePtr);
        
        if (charCount === -1) {
          throw new Error(`Error getting character count for page ${pageIndex}`);
        }
        
        return charCount;
      } finally {
        // Clean up text page
        pdfium.FPDFText_ClosePage(textPagePtr);
      }
    } finally {
      // Clean up PDF page
      pdfium.FPDF_ClosePage(pagePtr);
    }
  } finally {
    // Clean up document
    pdfium.FPDF_CloseDocument(docPtr);
    pdfium.pdfium.wasmExports.free(filePtr);
  }
}
 
// Usage
fetch('sample.pdf')
  .then(response => response.arrayBuffer())
  .then(buffer => getPageCharCount(new Uint8Array(buffer), 0))
  .then(count => {
    if (count === 0) {
      console.log('The page exists but contains no text.');
    } else {
      console.log(`The page contains ${count} characters.`);
    }
  })
  .catch(error => console.error('Error:', error));

Usage Examples

Allocating a buffer for text extraction


// Get the character count
const charCount = pdfium.FPDFText_CountChars(textPagePtr);
if (charCount <= 0) {
  return ''; // No text or error
}
 
// Allocate a buffer for the text (+1 for null terminator)
const bufferSize = (charCount + 1) * 2; // UTF-16, 2 bytes per character
const textBufferPtr = pdfium.pdfium.wasmExports.malloc(bufferSize);
 
try {
  // Extract text into the buffer
  pdfium.FPDFText_GetText(textPagePtr, 0, charCount, textBufferPtr);
  // ...
} finally {
  // Clean up buffer
  pdfium.pdfium.wasmExports.free(textBufferPtr);
}

Determining if a page contains text


function pageHasText(pdfium, textPagePtr) {
  const charCount = pdfium.FPDFText_CountChars(textPagePtr);
  return charCount > 0;
}

Best Practices

Check for errors: Always check if FPDFText_CountChars returns -1, which indicates an error.
Handle empty pages: A return value of 0 is valid and means the page exists but contains no text. Your code should handle this case gracefully.
Buffer allocation: When allocating memory for text extraction, always add 1 to the character count to accommodate the null terminator required by PDFium.
Use for capacity planning: This function is useful for determining how much memory to allocate for text extraction, allowing you to avoid buffer overflows.

Common Issues

Invalid text page handle: If you pass an invalid text page handle to FPDFText_CountChars, it will return -1.
Confusing character count with byte length: Remember that PDFium uses UTF-16LE encoding for text, so each character typically requires 2 bytes of storage. When allocating memory for text extraction, multiply the character count by 2.
Scanned documents: PDFs created from scanned images won’t have text content unless OCR has been applied. FPDFText_CountChars will return 0 for these pages.

FPDFText_LoadPage - Load a page for text extraction
FPDFText_GetText - Extract text from a page
FPDFText_ClosePage - Close a text page and release resources