Extract and export PDF bookmarks using C#

dbx April 9, 2026 31 views
  1. Environment Setup

    1.1 Install the Free Library
    Use the NuGet Package Manager in Visual Studio to install Free Spire.PDF

Install-Package FreeSpire.PDF

The free version supports basic operations such as reading PDF bookmarks and does not require an additional license file, but it is limited to 10 pages per document.

1.2 Import Namespaces

Add the following namespaces to your code:

using System;using System.IO;
using System.Text;using Spire.Pdf;
using Spire.Pdf.Bookmarks;
  1. Core Implementation Logic

The overall process can be broken down into four steps:

  1. Load the target PDF document.

  2. Retrieve the document’s PdfBookmarkCollection.

  3. Recursively traverse each bookmark and its child bookmarks to extract titles and display styles.

  4. Write the extracted content to a text file.

2.1 Load the Document and Retrieve the Bookmark Collection

PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile(@"D:\test.pdf");
PdfBookmarkCollection bookmarks = pdf.Bookmarks;

The Bookmarks property returns a collection containing the top-level bookmarks. If the document has no bookmarks, the Count will be 0.

2.2 Recursively Traverse the Bookmark Tree

The bookmark structure is a typical tree: each bookmark node may contain a collection of child bookmarks (accessible via the Count property and indexer). We design two methods:

  • GetBookmarks: Handles top-level bookmarks, initializes a StringBuilder, and starts the recursion.

  • GetChildBookmark: Recursively processes child bookmarks.

public static void GetBookmarks(PdfBookmarkCollection bookmarks, string result)
{
    StringBuilder content = new StringBuilder();
    if (bookmarks.Count > 0)
    {
        content.AppendLine("Pdf bookmarks:");
        foreach (PdfBookmark parentBookmark in bookmarks)
        {
            // Retrieve the title
            content.AppendLine(parentBookmark.Title);
            // Retrieve the display style (e.g., regular, bold, italic, etc.)
            content.AppendLine(parentBookmark.DisplayStyle.ToString());
            // Recursively process child bookmarks
            GetChildBookmark(parentBookmark, content);
        }
    }
    File.WriteAllText(result, content.ToString());
}

Recursive method:

public static void GetChildBookmark(PdfBookmark parentBookmark, StringBuilder content)
{
    if (parentBookmark.Count > 0)
    {
        foreach (PdfBookmark childBookmark in parentBookmark)
        {
            content.AppendLine(childBookmark.Title);
            content.AppendLine(childBookmark.DisplayStyle.ToString());
            GetChildBookmark(childBookmark, content);
        }
    }
}

2.3 Complete Code Example

Below is a complete console application example that outputs bookmark information to a file named GetPdfBookmarks.txt.

using System;
using System.IO;
using System.Text;
using Spire.Pdf;
using Spire.Pdf.Bookmarks;

namespace GetBookmark
{
    internal class Program
    {
        static void Main(string[] args)
        {
            PdfDocument pdf = new PdfDocument();
            pdf.LoadFromFile(@"D:\testp\test.pdf");

            PdfBookmarkCollection bookmarks = pdf.Bookmarks;
            string result = "GetPdfBookmarks.txt";
            GetBookmarks(bookmarks, result);

            Console.WriteLine("Bookmark extraction completed. The results have been saved to:" + result);
        }

        public static void GetBookmarks(PdfBookmarkCollection bookmarks, string result)
        {
            StringBuilder content = new StringBuilder();
            if (bookmarks.Count > 0)
            {
                content.AppendLine("Pdf bookmarks:");
                foreach (PdfBookmark parentBookmark in bookmarks)
                {
                    content.AppendLine(parentBookmark.Title);
                    content.AppendLine(parentBookmark.DisplayStyle.ToString());
                    GetChildBookmark(parentBookmark, content);
                }
            }
            else
            {
                content.AppendLine("The PDF document does not contain any bookmarks.");
            }
            File.WriteAllText(result, content.ToString());
        }

        public static void GetChildBookmark(PdfBookmark parentBookmark, StringBuilder content)
        {
            if (parentBookmark.Count > 0)
            {
                foreach (PdfBookmark childBookmark in parentBookmark)
                {
                    content.AppendLine(childBookmark.Title);
                    content.AppendLine(childBookmark.DisplayStyle.ToString());
                    GetChildBookmark(childBookmark, content);
                }
            }
        }
    }
}

3.Output Format Description

Each bookmark in the generated text file is represented by two lines: the first line is the title, and the second line is the display style. For example:

PDF Bookmarks:
Chapter 1 Introduction
Regular
1.1 Background
Bold
1.2 Objectives
Italic
Chapter 2 Implementation
Regular
2.1 Environment Setup
Regular

DisplayStyle is an enumeration with the following possible values:

  • Regular: Normal text

  • Bold: Bold

  • Italic: Italic

The output will vary בהתאם on the actual bookmark styles defined in the PDF document.

4.Notes and Extensions

4.1 Bookmarks May Be Empty

If the PDF has no bookmarks, bookmarks.Count will be 0. In this case, the code writes a message to the file to avoid generating an empty file.

4.2 Retrieving Target Page Numbers and Actions

The above example only retrieves the title and style. If you also need to get the target page number a bookmark links to, you can use the PdfBookmark.Action property (be sure to check the action type). For example:

if (parentBookmark.Action is PdfGoToAction goToAction)
{
    int pageIndex = pdf.Pages.IndexOf(goToAction.Destination.Page);
    content.AppendLine($"Navigate to page {pageIndex + 1} page");
}

Free Spire.PDF provides fairly comprehensive support for Action, so you can extend the functionality based on your specific needs.

4.3 Performance Considerations

For PDFs containing thousands of bookmarks, recursive traversal typically does not cause noticeable performance issues. However, if extraction needs to be performed frequently, consider using a StreamWriter for streaming writes instead of a StringBuilder to reduce memory usage.

4.4 Encoding Handling

File.WriteAllText uses UTF-8 encoding by default. If you need to specify a different encoding (such as GB2312), you can use a StreamWriter instead.

5.Summary

This article demonstrates how to fully extract multi-level bookmark information from a PDF document using a free .NET library. The key points include:

  • Accessing the root bookmark collection via PdfDocument.Bookmarks.

  • Recursively traversing PdfBookmark nodes using the Count property and indexer.

  • Reading the Title and DisplayStyle properties.

  • Writing the structured data to a text file.

This approach does not rely on Adobe Acrobat or any other GUI tools, making it ideal for integration into backend services or document processing pipelines. Developers can further extend this approach to retrieve bookmark page numbers, zoom settings, or even modify the bookmark structure.


Advertisement

Tags:

dbx

Written by dbx

Author at ITProgram