Read PDF in Java Using iText

In this post we’ll see a Java program to read PDF document using iText library.

To know more about iText library and PDF examples check this post- Generating PDF in Java Using iText Tutorial

Reading PDFs using iText

For reading PDF using iText you need to use the following steps.

  1. Create a PDFReader instance, wrap it with in a PDFDocument.
  2. Get the number of pages in the PDF that has to be read.
  3. Iterate through pages and extract the content of each page using PdfTextExtractor.

PDF used for reading.

List Items iText

Java Program

import java.io.IOException;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor;

public class ReadPDF {
  public static final String READ_PDF = "F://knpcode//result//List.pdf";
  public static void main(String[] args) {
  try {
    // PDFReader
    PdfReader reader = new PdfReader(READ_PDF);
    PdfDocument pdfDoc = new PdfDocument(reader);
    // get the number of pages in PDF
    int noOfPages = pdfDoc.getNumberOfPages();
    System.out.println("Extracted content of PDF---- ");
    for(int i = 1; i <= noOfPages; i++) {
      // Extract content of each page
      String contentOfPage = PdfTextExtractor.getTextFromPage(pdfDoc.getPage(i));
      System.out.println(contentOfPage );
    }
    pdfDoc.close();
    }catch (IOException e) {
      System.out.println("Exception occurred " + e.getMessage());
    }
  }
}

Output

Extracted content of PDF---- 
List with Roman symbols
i. Item1
ii. Item2
iii. Item3
List with English letter symbols
A. Item1
B. Item2
C. Item3
List with Greek letter symbols
α. Item1
β. Item2
γ. Item3

Related Posts

That’s all for the topic Read PDF in Java Using iText. If something is missing or you have something to share about the topic please write a comment.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.