java pdfbox 提取pdf 标题,如何使用Java中的PDFBox获取PDF文件中的所有书签_胖芋圆的耳饰的博客

相关文章推荐

讲道义的鸡蛋面 · 【置顶】2025年全国硕士研究生招生考试 ...· 5 月前 ·

直爽的青蛙 · （7）javascript的程序控制结构及语 ...· 6 月前 ·

心软的茄子 · nodejs中http-proxy使用小结 ...· 7 月前 ·

另类的砖头 · 国务院食品安全办、公安部、市场监管总局 ...· 7 月前 ·

大鼻子的热带鱼 · solr-7.7.3配置详解，定时更新sol ...· 1 年前 ·

I am newbie in Apache PDFbox. I want to extract all bookmarks in PDF file using PDFBox library in Java. Any idea how to extract them?

From the PrintBookmarks example in the source code download

PDDocument document = PDDocument.load(new File("..."));

PDDocumentOutline outline = document.getDocumentCatalog().getDocumentOutline();

printBookmark(outline, "");

document.close();

(...)

public void printBookmark(PDOutlineNode bookmark, String indentation) throws IOException

PDOutlineItem current = bookmark.getFirstChild();

while (current != null)

System.out.println(indentation + current.getTitle());

printBookmark(current, indentation + " ");

current = current.getNextSibling();

I am newbie in Apache PDFbox. I want to extract all bookmarks in PDF file using PDFBox library in Java. Any idea how to extract them?解决方案From the PrintBookmarks example in the source code downloadPDDo...

本文将介绍如何在 Java 应用程序中读取 PDF 文件的文本内容。(读取图片也支持，参考这篇文章 Java 提取 PDF 文档中的图片) 在 Java 应用程序中读取 PDF ，我们可以借助第三方 PDF 控件，本文所使用的控件是免费 Java PDF 组件Free Spire. PDF for JAVA 。在使用以下代码前，你需要下载Free Spire. PDF for JAVA 包并解压缩，然后从lib 文件夹下，导入Sp...

在本教程中，我们将学习如何使用 Apache PDFBox 库将书签项添加到 Java 中的 PDF 文档中。该帖子还展示了如何为新的 PDF 文档和现有的 PDF 文件添加书签。 Apache PDFBox 库概述 Apache PDFBox 是一个用于处理 PDF 文档的开源 Java 库。您可以在 pdfbox .apache.org获得有关该项目的更多信息添加 Apache PDFBox 依赖项如果您使用 Gradle 构建工具，请将以下依赖项添加到 build.gradle 文件中 1 使用开源组织提供的开源框架 pdfbox api ； https:// pdfbox .apache.org/ 特点:免费，功能强大，解析中文或许会存在乱码，格式有点乱，没有国产解析的那么美化。可以按照指定的模板，对 pdf 进行修改添加删除... import java .io.File; import java .io.UnsupportedEncodingException; import java .sql.Connection; import java .sql.DriverManager; import java .sql.PreparedStatement; import java .sql.SQLException; import java .util.ArrayList; import java .util.List; import com.spire. pdf . Pdf Document; import com.spire. pdf . Pdf PageBase; public class Read PDF { public static void main(String[] args) { //需要复制的目标文件或目标文件夹 String pathname = "F:\\读取 PDF 中的信息"; // File file = new File(pathname); List list = new ArrayList(); readFile(pathname,list); for(int j=0;j<list.size();j++) { // System.out.println("当前第"+(j+1)+"个----"+list.get(j)); //创建 Pdf Document实例 Pdf Document doc = new Pdf Document(); //加载 PDF 文件 doc.loadFromFile(list.get(j)); StringBuilder sb = new StringBuilder(); Pdf PageBase page; //遍历 PDF 页面，获取文本 for (int i = 0; i < doc.getPages().getCount(); i++) { page = doc.getPages().get(i); sb.append(page.extractText(true)); // System.out.println(sb.toString()); String str = getStr(sb.toString()); System.out.println(str); String[] arr = str.split("；"); String gh = ""; String gw = ""; for(int i=0;i<arr.length;i++) { arr[i] = arr[i].trim(); if(i==0) { gh = arr[i]; }else if(i==1) { gw = arr[i]; }else { arr[i] = arr[i].replace(gh, "").replace(gw, ""); // System.out.println(); insertSQL(arr); // FileWriter writer; // try { ////将文本写入文本文件 // writer = new FileWriter("f://ExtractText.txt"); // writer.write(sb.toString()); // writer.flush(); // } catch (IOException e) { // e.printStackTrace(); // } doc.close(); public static String getStr2(String str) { try { byte[] bs = str.getBytes("utf-8"); for(int i=0;i<bs.length;i++) { byte b = bs[i]; if(b==0) { bs[i]=9; str =

如果不想使用 PDFBox 库，也可以使用 Java 内置的 PDF 解析工具包iText来读取 PDF 文件的内容。以下是一个简单的示例代码，演示如何使用 iText来提取 PDF 中的文本内容： ``` java import java .io.IOException; import com.itext pdf .text. pdf . Pdf Reader; import com.itext pdf .text. pdf .parser. Pdf TextExtractor; public class Read PDF { public static void main(String[] args) throws IOException { // 读取 PDF 文件 Pdf Reader reader = new Pdf Reader("example. pdf "); // 获取文本内容 StringBuilder text = new StringBuilder(); for (int i = 1; i <= reader.getNumberOfPages(); i++) { text.append( Pdf TextExtractor.getTextFromPage(reader, i)); System.out.println(text); // 关闭文档 reader.close(); 这段代码将打印出 PDF 文件中的文本内容。你需要将`example. pdf `替换为你要读取的 PDF 文件的路径。此外，iText还提供了很多其他的 PDF 解析功能，如提取图片、表格等。