Read PDF in Java Using iText

In this post we’ll see a Java program to read PDF document using iText library.

To know more about iText library and PDF examples check this post- Generating PDF in Java Using iText Tutorial

Reading PDFs using iText

For reading PDF using iText you need to use the following steps.

  1. Create a PDFReader instance, wrap it with in a PDFDocument.
  2. Get the number of pages in the PDF that has to be read.
  3. Iterate through pages and extract the content of each page using PdfTextExtractor.

PDF used for reading.

List Items iText

Java Program

import java.io.IOException;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor;

public class ReadPDF {
  public static final String READ_PDF = "F://knpcode//result//List.pdf";
  public static void main(String[] args) {
  try {
    // PDFReader
    PdfReader reader = new PdfReader(READ_PDF);
    PdfDocument pdfDoc = new PdfDocument(reader);
    // get the number of pages in PDF
    int noOfPages = pdfDoc.getNumberOfPages();
    System.out.println("Extracted content of PDF---- ");
    for(int i = 1; i <= noOfPages; i++) {
      // Extract content of each page
      String contentOfPage = PdfTextExtractor.getTextFromPage(pdfDoc.getPage(i));
      System.out.println(contentOfPage );
    }
    pdfDoc.close();
    }catch (IOException e) {
      System.out.println("Exception occurred " + e.getMessage());
    }
  }
}

Output

Extracted content of PDF---- 
List with Roman symbols
i. Item1
ii. Item2
iii. Item3
List with English letter symbols
A. Item1
B. Item2
C. Item3
List with Greek letter symbols
α. Item1
β. Item2
γ. Item3

Related Posts

That’s all for the topic Read PDF in Java Using iText. If something is missing or you have something to share about the topic please write a comment.


You may also like

Share this post:

One Comment

  1. Hello!
    Thank you for the post. To make it more useful it seem necessary to add link to sample PDF doc.
    And more important – I’m searching a not commercial PDF library that can extract text and images (images’ info) in “as in PDF document” sequence. It needs to verify that image’s group description correspond to images attached (below description). This is a goal of the test.
    Could you advice me such library, please?
    Thank you in advance!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.